LongWriter AI breaks the 10,000-word barrier, challenges human authors

Sign up for our daily and weekly newsletters to stay up to date with the latest updates and exclusive content on industry-leading AI coverage. More information

Researchers at Tsinghua University in Beijing have developed a new artificial intelligence system that can produce coherent texts of more than 10,000 words. This is a major step forward that could change the way longer texts are written in various fields.

The system, described in a paper called “LongWriter: Unleash 10,000+ Word Generation from Long Context LLMs,” tackles a persistent challenge in AI technology: the ability to generate long-form, high-quality written content. This development could have far-reaching implications for tasks ranging from academic writing to fiction, and could potentially change the landscape of content creation in the digital age.

The research team, led by Yushi Bai, found that the output length of an AI model directly correlates with the length of the texts it encounters during training. “We find that the effective generation length of the model is inherently bounded by the sample it has seen during supervised fine-tuning,” the researchers explain. This insight led them to “LongWriter-6k,” a dataset of 6,000 writing samples ranging from 2,000 to 32,000 words.

By feeding their AI model this data-rich diet during training, the team scaled its maximum output length from around 2,000 words to over 10,000 words. Their 9 billion parameter model even outperformed larger, proprietary models on long-text generation tasks.

LongWriter-glm4-9b from @thukeg can generate more than 10,000 words at a time!?
The paper identifies a problem with current long-context LLMs: they can process inputs up to 100,000 tokens, but struggle to generate outputs longer than 2,000 words.
The article proposes that a… photo.twitter.com/2jfKyIpShK
— Gradio (@Gradio) August 14, 2024

A double-edged pen: opportunities and challenges

This breakthrough could transform industries that rely on long-form content. Publishers could use AI to generate early drafts of books or reports. Marketing agencies could make in-depth white papers or case studies more efficient. Education technology companies could develop AI tutors who can produce comprehensive study materials.

However, the technology also poses significant challenges. The ability to generate vast amounts of human-like text could exacerbate problems with misinformation and spam. Content creators and journalists could face increased competition from AI-generated articles. Academic institutions will need to refine plagiarism detection tools to identify AI-authored papers.

Comparative performance of leading AI language models, including proprietary and open-source options, alongside the new LongWriter models from Tsinghua University. The table shows that LongWriter-9B-DPO outperforms other models in overall scores and excels in generating longer texts from 4,000 to 20,000 words. (credit: github.com)

The ethical implications are equally profound. As AI-generated text becomes indistinguishable from human-written content, questions of authorship, creativity, and intellectual property become more complex. The development of long-form AI writing skills could also influence human language skills, potentially enhancing creativity or leading to atrophy of writing skills.

Rewriting the Future: Implications for Society and Industry

The researchers have their code and models open sourced on GitHuballowing other developers to build upon their work. They also released a demonstration video in which their model generates a coherent 10,000-word travel guide to China based on a simple prompt, highlighting the technology’s potential for producing detailed, structured content.

As AI continues to develop, the line between human-generated and machine-generated text continues to blur. This breakthrough in generating long-form text represents not only a technical achievement, but also a turning point that could change our relationship with written communication.

The challenge now lies in deploying this technology responsibly. Policymakers, ethicists and technologists need to work together to develop frameworks for the ethical use of AI-generated content. Education systems may need to evolve to emphasize skills that complement AI capabilities rather than compete with them.

As we enter this new era of AI-assisted writing, the written word, long considered a uniquely human domain, is venturing into uncharted territory. The implications of this shift are likely to resonate across society and impact how we create, consume, and value written content in the years to come.

VB Daily

Stay informed! Receive the latest news in your inbox every day

By subscribing, you agree to VentureBeat's terms Terms of Service.

Thanks for subscribing. See more VB newsletters here.

An error has occurred.

LongWriter AI breaks the 10,000-word barrier, challenges human authors

A double-edged pen: opportunities and challenges

Rewriting the Future: Implications for Society and Industry

Recent Post

Keyword Tag Cloud