Skip to content

Working with Figures

TeXRA lets AI agents analyse, reference, and generate figures inside your documents — from .png screenshots to embedded TikZ diagrams and PDFs.

CLI

This page covers the VS Code Media selector. From the texra CLI, pass image and PDF files to an agent with --input / --context (a vision-capable model reads them directly), or describe the figure to a tool-use agent in texra chat.

Quick Task: Add a Figure Caption

  1. Select the polish agent from the agent dropdown ().
  2. Pick a vision-capable model () — e.g. gpt55, sonnet46T, gemini31p.
  3. Select your figure in the Media () section.
  4. Type the instruction: "Write a detailed caption for this figure."
  5. Click Execute ().

The Media Section

The main TeXRA panel includes a Media section for figure files. Its header carries the inline Auto Extract dropdown () plus a three-action toolbar, and each file is a row you can drag to reorder:

Media
figure1.pdf
plot.png
schematic.tikz

The Media group: the wand opens auto-extract options, the toolbar adds opened files / clears all / adds media files, and each row drags to reorder with a trailing trash icon.

  • Auto Extract Dropdown (): configure automatic figure extraction.
  • Add opened files (): append every open editor tab whose extension is a configured media type.
  • Clear all media files (): empty the media list.
  • Add media files (): open a file picker to append figures.
  • Drag-and-drop image, PDF, or audio files from anywhere onto the section.

Supported File Types

Configurable via texra.files.included.mediaExtensions:

Images

Encoded and attached for vision-capable models.

.png.jpeg.jpg.gif.heic.heif.webp
Documents

Native multimodal on Anthropic / Gemini / OpenAI, otherwise rasterised.

.pdf
AudioExperimental

Uploaded for transcription by audio-capable models.

.wav.m4a.mp3.aiff.aac.ogg.flac

Three media categories with their accepted extensions — images and audio carry several formats, while PDFs lean on native multimodal support where the provider offers it.

PDFs use native multimodal support when the provider offers it. Otherwise TeXRA converts them via GraphicsMagick / ImageMagick + Ghostscript — status for those system dependencies lives on Dashboard → LaTeX ().

Clipboard Images

Paste images directly into the instruction area:

  1. Copy any image to the clipboard.
  2. Paste with Ctrl/Cmd+V.
  3. The image is saved and referenced as [pasted_timestamp_hash.ext].
  4. The Media Files list () updates automatically.

Pasted images are stored temporarily and cleaned up after 3 days.

Automatic Figure Extraction

Enable via the Auto Extract dropdown () near the Media label:

Media

The lit wand button opens a checkbox menu — toggle Figures, TikZ Figures, and Compile Input PDF.

  • Figures (): extracts images from \includegraphics commands.
  • TikZ Figures (): extracts tikzpicture environments as .tikz files and compiles them.

Figure Extraction Tools

Tool-use agents can extract figures programmatically. These are part of the LaTeX Extraction built-in tool group — always available. In a run, an agent like research drives them one after another, attaching what it finds so a multimodal model can read it:

researchgathering a paper's figures · LaTeX Extraction · this run
  • extract_figurespaper/main.texFound 9 \includegraphics — attached 9 files
  • extract_bib_entriespaper/main.texReturned BibTeX for every \cite key
  • extract_tikz_figurespaper/main.texcompile: true → latexmk renders 7 tikzpicture snippets, attaches 7 PDFs

The three LaTeX Extraction tools as they surface in the Progress view — each returns referenced files, BibTeX records, or compiled TikZ PDFs the model can read directly. The raw request form for each is below.

extract_figures

Scans for \includegraphics and returns referenced files (up to 20 attachments):

json
{
  "name": "extract_figures",
  "arguments": { "texPath": "paper/main.tex" }
}

extract_tikz_figures

Extracts and optionally compiles tikzpicture environments (up to 12 PDFs):

json
{
  "name": "extract_tikz_figures",
  "arguments": { "texPath": "paper/main.tex", "compile": true }
}

Setting compile: true runs latexmk/pdflatex on each snippet and attaches the resulting PDFs so multimodal models can read them directly.

extract_bib_entries

Retrieves BibTeX records for every citation key found in the document.

Using Media Files with Models

When you provide media files, they're handed to the model according to its capabilities:

  • Vision models (GPT-5.5, Claude 4.7 Opus / 4.6 Sonnet, Gemini 3.1 Pro, …): images are encoded and attached to the prompt.
  • Audio models: audio files are uploaded for transcription.
  • Non-multimodal models: only filenames are passed as context.

Common use cases:

  • Write captions for images (polish agent)
  • Verify text matches figures (correct agent)
  • Generate text from images/PDFs (ocr agent)
  • Transcribe audio (transcribe_audio agent)

For TikZ-specific workflows, see TikZ Figures.