“`html
Highlights
AI Tools and Their Impact on Workplace Efficiency
Artificial intelligence (AI) tools are rapidly becoming integral partners in the workplace, assisting with tasks such as drafting emails, editing code, and managing intricate documents. However, a recent research paper indicates that over-relying on these systems might undermine the quality of work they are designed to enhance.
Must read: Vietnam is winning the chip race India wanted to lead
A study conducted by Microsoft Research reveals that large language models (LLMs) like ChatGPT and Claude can progressively damage documents when engaged in repetitive editing tasks. Researchers noted that in some situations, even leading-edge models “corrupt an average of 25% of document content” after prolonged use.
The findings raise critical concerns about the increasing trend of assigning responsibilities to AI systems with little oversight in professional environments.
The Promise and Hazards of AI Delegation
The concept of AI delegation is straightforward. Instead of individually editing documents, users provide guidelines and allow AI systems to execute the tasks. This strategy, occasionally referred to as “delegated work” or “vibe coding,” signifies a profound transformation in how professional tasks are completed. Trust is essential for this approach.
The researchers highlighted that delegation necessitates trust—the belief that the LLM will accurately perform the task without introducing mistakes into documents.
However, the study implies that such trust may be unfounded. Employing a benchmark identified as DELEGATE-52, the research team evaluated 19 diverse AI models across 52 specialised areas, including coding, accounting, music notation, and textile design. The objective was to simulate realistic workflows where documents undergo multiple edits over time.
According to the study, “Our findings indicate that current LLMs introduce considerable errors when editing work documents. Top models (Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4) on average lose 25% of document content after 20 delegated interactions, with an overall average degradation of 50% across all models.”
Minor Errors, Major Implications
A significant discovery is that AI systems do not always fail in obvious ways. Instead, they produce what the researchers refer to as “sparse but severe errors that quietly compromise documents.”
Must read: Pure software is rapidly becoming uninvestable: Naval Ravikant on why vibe coding changes everything
These errors may be simple, such as an incorrect number or a missing sentence. However, with repeated editing, these mistakes accumulate and alter the final output.
On average, degradation reached approximately 50% across all models tested by the end of lengthy workflows. Even top-tier systems exhibited reduced performance over time.
The paper stated, “Current LLMs are unreliable delegates,” highlighting that their effectiveness diminishes as the number of interactions increases.
The Detrimental Impact of Longer Workflows
The study emphasises a crucial concern. AI systems face challenges with prolonged, multi-step tasks. Although many models excel in short interactions, their accuracy tends to plummet when tasks are connected in a chain.
The researchers noted that “Short-term performance… does not always predict long-horizon performance.”
Must read: AI layoffs may hurt companies too, not just workers: Study warns of ‘automation trap’
This is significant since most real-world work involves a series of steps. Documents are revised multiple times, not merely once. The issue escalates with larger and more complex files. More steps translate to greater opportunities for errors, and those errors accumulate over time.
One might assume that granting access to tools like code execution or file editing utilities would enhance AI accuracy. However, the study found the contrary.
Models employing tools demonstrated slightly inferior results. This is partly due to technical factors—using tools increases the amount of data the model must process, complicating the maintenance of consistency across different steps.
Variability Across Domains
The research also indicates that AI efficiency differs based on the type of task. Structured and rule-based areas, such as programming, show significantly better performance. Coding was the sole field where most models reliably managed delegated workflows.
Must read: Tech layoffs 2026: Nearly 40,000 jobs lost in April amid changing AI priorities
In contrast, tasks that involve natural language or specialised formats, such as financial records or creative pieces, exhibited considerably higher rates of errors.
Implications for Work Environments
The findings arrive at a time when organisations are increasingly adopting AI in their everyday operations. From report generation to data management, these tools are frequently employed with minimal human oversight. The study indicates that this approach might require reconsideration.
The researchers cautioned that users “still need to closely monitor LLM systems as they operate,” particularly in tasks with higher stakes.
Despite existing shortcomings, the researchers noted that advancements are swift. Newer models demonstrate notable improvements over previous versions, though they are not yet primed for complete delegation.
“`






