Skip to main content

Putting User Assets Back in the Hands of Users: LLMs, Open Formats, and Controllable Toolchains

· 13 min read

A small observation about AI poster and slides generators led me to a much larger question: who actually controls the assets we create?

At first glance, the market looks full of convenient tools. There are AI poster generators, AI slide makers, AI document editors, AI diagram tools, AI writing platforms, and many polished Software as a Service (SaaS) products that promise to turn a prompt into a beautiful output. For many users, this is attractive. You type a sentence, pick a template, drag a few objects, and export a PDF.

But once I looked at the problem from the perspective of a geek, the surface convenience became less interesting than the underlying structure.

A poster is not just a picture. A technical diagram is not just decoration. A paper is not just formatted text. These are user assets. They encode ideas, methods, assumptions, structures, relationships, workflows, evidence, and arguments. If these assets are created inside a black-box platform, then the user may be able to edit them visually, but does not fully own the logic by which they are represented, rendered, transformed, and preserved.

That distinction matters.

The problem is not whether AI can generate a poster

Many AI poster tools probably do not rely purely on image generation. Behind the scenes, they likely use some internal object model: text boxes, shapes, panels, coordinates, fonts, colors, and layout constraints. The final poster may be editable in a web GUI. In that narrow sense, they are not merely producing a dead image.

But that does not solve the real problem.

The issue is not whether a poster consists of editable objects. The issue is whether the user has access to the object model, the file format, the rendering logic, and the toolchain that turns source representation into visual output.

A closed SaaS platform may give me objects to drag. But those objects live inside someone else's system. The platform controls the schema, the renderer, the export logic, the templates, the pricing, and the future compatibility of my work. I can edit what the platform allows me to edit. I can export what the platform allows me to export. I can preserve what the platform allows me to preserve.

That is not the same as owning the asset.

This is why I became interested in a very different workflow:

Natural language intention -> LLM-assisted source editing -> open structured file -> open renderer/editor -> human visual refinement -> portable output.

In practice, one concrete version of this is:

LLM -> draw.io XML -> diagrams.net/draw.io GUI -> SVG/PDF/PNG export.

At first, this sounds like a cheap trick: instead of paying for an AI poster SaaS, ask an LLM to edit a draw.io XML file directly.

But the deeper point is not cost. The deeper point is control.

XML alone is not enough

It is tempting to say that draw.io works well with LLMs because its file format is XML. XML is text. LLMs can read text, modify text, and produce text. Therefore, LLMs can edit draw.io files.

That explanation is only half correct.

Many commercial formats are also structured. A .docx file, for example, is essentially a package containing XML files. It is not a mysterious binary blob in the old sense. In principle, one can inspect it, modify it, and generate it.

But anyone who has dealt seriously with Word documents knows the problem: readable does not mean predictable.

Word processing formats carry decades of historical compatibility. The specification may contain behaviors inherited from old versions, compatibility modes, layout conventions, default settings, and implementation-dependent interpretations. You may edit the XML, but the GUI may not behave as expected. You may change a paragraph style, only to discover that numbering, spacing, table layout, floating objects, fonts, or pagination are affected by rules hidden somewhere else.

The file is structured, but the system is not fully transparent.

This is the critical distinction:

Structured format is not the same as controllable toolchain.

For LLM collaboration, the problem is not simply whether the file can be read. The problem is whether the LLM can reason about how the file will be interpreted.

In the case of draw.io, the advantage is not merely that the .drawio file is XML. The important point is that the editor and renderer are open enough for the behavior of the file to be studied, debugged, and understood. If something goes wrong in the XML, one can inspect not only the file, but also the logic that interprets the file.

That changes the nature of LLM assistance.

Without access to the rendering logic, the LLM is guessing. With access to the source format and the implementation, the LLM can work within an intelligible graphical language.

XML provides editability. Open implementation provides explainability. The GUI provides human correction. The export formats provide portability.

The power comes from the combination.

Why LaTeX still matters

The same idea also applies to documents. This is why LaTeX has remained important for serious technical writing.

LaTeX is not valuable simply because it produces beautiful PDFs. Its deeper value is that it turns documents into source files. A paper becomes something that can be edited as text, version-controlled, searched, diffed, compiled, debugged, and reproduced.

The workflow is transparent in a way that GUI-based document editing often is not. There may still be complexity. LaTeX templates can be painful. Packages can interact in obscure ways. Compilation errors can be ugly. But the document is fundamentally source-driven.

That makes it a natural partner for LLMs.

LLMs are much better at modifying source representations than manipulating opaque GUIs. They can revise Markdown, LaTeX, XML, SVG, YAML, Julia code, configuration files, and structured documents. They can explain diffs, trace errors, update repeated patterns, and refactor text. They can operate where the work is represented as explicit structure.

This is why I currently prefer Quarto for many documents.

Pure LaTeX is powerful, but it can be unnecessarily heavy for everyday writing. Quarto provides a practical front end: Markdown for readable writing, YAML for metadata, Pandoc for conversion, and LaTeX or other backends when needed. It preserves the source-file philosophy while reducing the friction.

In that sense, Quarto is not a rejection of LaTeX. It is a more ergonomic layer over the same general principle:

Keep the core asset as an open, text-based, version-controllable source file.

LLMs should edit source files, not trap us inside GUIs

The strongest form of LLM-assisted work is not "click a magic button in a platform."

It is this:

The user expresses intent. The LLM translates that intent into modifications of open source files. The open toolchain renders or compiles those files. The user inspects and corrects the result. The final asset remains portable and maintainable.

This is fundamentally different from using an AI feature inside a closed SaaS editor.

In a closed platform, the AI is embedded inside the vendor's environment. The output is often a platform object. The history, schema, rendering, and future compatibility are controlled externally.

In an open-source-file workflow, the LLM is a replaceable assistant. It does not own the document. It does not own the diagram. It does not own the project. It only helps transform an asset that remains under the user's control.

That is the key architectural difference.

  • For posters, this means LLMs should generate or modify draw.io XML, SVG, Mermaid, Graphviz, Typst, or other open representations, rather than only producing flat images or platform-locked designs.

  • For documents, this means LLMs should work on Markdown, Quarto, LaTeX, BibTeX, CSL, YAML, and related source files, rather than only operating inside Word or a proprietary cloud editor.

  • For software, this means LLMs should work with code, tests, documentation, CI, and reproducible environments, rather than only producing isolated snippets.

Output should be source-like, not screenshot-like

A screenshot is dead. A source file is alive.

A screenshot can be viewed, but not meaningfully edited. A source file can be modified, regenerated, reviewed, and reused. A screenshot captures a moment. A source file captures a process.

This matters especially for technical work, because communication is iterative. Figures change. Terminology changes. Models change. Equations change. Experimental results change. Reviewers ask for revisions. Collaborators ask for different layouts. Posters need new dimensions. A method diagram becomes a figure in a paper, then a slide in a talk, then documentation for software.

If the asset is a flat image or trapped inside a SaaS platform, every revision becomes manual repair.

If the asset is source-like, revisions become engineering operations.

This is why "Poster-as-Code" is not just a slogan. It means treating visual communication as something closer to software or LaTeX: structured, inspectable, editable, and reproducible.

A poster should not be a one-time visual artifact. It should be a maintained representation of an argument.

Open source is not just ideology

It is easy to frame this as an open-source preference, but that is too shallow.

The real issue is not moral purity. The issue is whether the toolchain allows understanding, debugging, migration, and long-term maintenance.

Open source matters because it makes the interpreter visible.

For a document, the interpreter may be a LaTeX engine, Pandoc, or a rendering pipeline. For a diagram, the interpreter may be the draw.io/diagrams.net editor and renderer. For software, the interpreter may be a compiler, package ecosystem, test runner, or CI system.

If the source file is visible but the interpreter is hidden, the workflow remains partially black-boxed. The LLM can edit inputs, but it cannot fully understand why outputs behave as they do.

This is the lesson from Word-like systems. A document format may be technically inspectable, but if the actual rendering behavior depends on a closed, historically layered GUI engine, the result is difficult to predict.

By contrast, when both the source representation and the interpreting toolchain are open, LLM collaboration becomes much more powerful. Errors become traceable. Behavior becomes explainable. Outputs become reproducible. The user can intervene at every layer.

Open source is therefore not merely about avoiding license fees. It is about reducing epistemic dependence on tools whose behavior cannot be fully inspected.

For serious work, that is not a small issue.

Cost is only one part of the story

Of course, cost matters. Many students, independent creators, and small teams cannot casually pay for every commercial SaaS tool, design platform, document service, or specialized editor. The financial burden of modern tooling is real.

But the deeper cost is lock-in.

A cheap SaaS tool can become expensive if it captures your assets. A convenient platform can become costly if leaving it means losing editability, structure, metadata, or reproducibility. A polished editor can become a liability if its internal format changes, export quality degrades, or pricing model shifts.

This is why I do not think the goal is simply "never spend money."

A better principle is:

Spend money on replaceable resources, not on locking up irreplaceable user assets.

Compute is replaceable. API providers are replaceable. Models are replaceable. Hardware can be upgraded. Even a commercial LLM API can be acceptable if it is only used as a temporary intelligence layer.

But the core assets should remain portable: source files, open formats, version history, reproducible builds, editable diagrams, and local copies.

In my current workflow, an OpenAI API call may help modify a document or diagram. But the resulting asset is not stored inside OpenAI. It is not a proprietary AI-platform object. It is a Quarto file, a Markdown file, a draw.io XML file, a Julia source file, an SVG, or a PDF.

The LLM service can be replaced later by another API or by a local LLM running on my own GPU. The core workflow remains intact.

That is the difference between using a commercial service and being owned by one.

A personal workstation, not a SaaS dependency chain

The long-term direction is clear to me: a personal workstation built around open tools.

Today, the LLM layer may depend on a cloud API. Tomorrow, with local hardware, it may run locally. That transition should not require changing the structure of the core assets.

This is the architecture I care about.

Not "AI replaces tools." Not "LLMs generate everything." Not "LLMs make design automatic."

Rather:

LLMs become source-level collaborators inside an open, inspectable, user-controlled toolchain.

That is a much more durable vision.

Modern SaaS tools often sell convenience. Convenience is not bad. But convenience can hide a transfer of control.

The user gives the platform intention, content, structure, and data. The platform returns a polished output. In the short term, this feels efficient. In the long term, the user may discover that the real asset has never fully belonged to them.

Serious work should not work that way.

A user should be able to inspect the document source, understand the diagram representation, reproduce the figure, migrate the project, regenerate the output, and continue editing years later without depending on a vendor's web interface.

That is not nostalgia for old tools. It is a practical requirement for serious intellectual work.

AI makes this even more important. As AI systems become more capable, the temptation to work entirely inside closed AI platforms will increase. But the more powerful AI becomes, the more important it is to decide where its outputs land.

If AI produces platform-locked artifacts, then AI accelerates dependence. If AI produces open source assets, then AI strengthens independence.

That choice is architectural.

Conclusion: put the asset first

The central question is not "Which AI tool can generate the best poster?"

The central question is:

Where does the asset live, and who controls it?

If the asset lives inside a closed platform, the user is renting convenience. If the asset lives in an open, inspectable, version-controllable source format, the user owns a durable piece of intellectual infrastructure.

This is why I care about workflows such as Quarto for documents and draw.io XML for diagrams. They allow LLMs to collaborate without taking control away from the user. They turn communication into something closer to engineering: transparent, reproducible, editable, and portable.

The goal is not merely to save money. The goal is to make money less relevant to the ability to create, express, revise, and preserve work.

Money can be spent on compute. Money can be spent on hardware. Money can be spent on replaceable services.

But money should not have to be spent to keep one's own assets editable, understandable, and alive.

That is the real promise of LLMs combined with open formats and open toolchains:

not automatic content generation, but the return of control.

User assets should belong to users.