Markdown is the lingua franca of LLMs and modern note-taking. PDF to Markdown takes any PDF — research paper, ebook, contract, scanned report — and gives you tidy .md with headings, paragraphs, and lists preserved. Drop the result into Obsidian, Notion, GitHub, an MDX docs site, or straight into ChatGPT, Claude, or Gemini.
Conversion runs locally with pdf.js. Scanned PDFs trigger an automatic OCR fallback via Tesseract. Files never leave your browser.
How to use it
- 1.
Drop the PDF
Born-digital and scanned PDFs both work. Up to ~50 MB recommended.
- 2.
Wait for conversion
Pages stream in via a progress bar. Scanned PDFs take longer because OCR runs per page.
- 3.
Copy or download .md
Switch between Markdown, Cleaned (no syntax), and Raw. Download as .md for direct paste into Obsidian, GitHub, or your RAG pipeline.
Why Markdown beats raw PDF text for AI
LLMs are trained on enormous amounts of GitHub-flavored Markdown. They recognize ## headings, lists, and code blocks instantly — which means better chunking for RAG, better summaries, and roughly half the tokens compared to HTML. Markdown also survives the round-trip into and out of a model, so structured edits stay structured.
How heading detection works
The converter walks every paragraph and promotes short, capitalized lines that don't end in sentence punctuation to ## (H2) headings. It's a heuristic — perfect for most reports and ebooks, occasionally over-eager on tables of contents. The Cleaned tab is always available if you want the same content with no Markdown syntax at all.
Best for
- Research papers headed for Obsidian or Notion
- Building a Markdown-based RAG knowledge base
- Feeding clean structured context to ChatGPT or Claude
- Migrating PDF reports into MDX docs sites
- Long ebooks where heading structure matters