Convert MCP API
Document OCR, text extraction, and audio transcription via Model Context Protocol
The Convert MCP API provides AI assistants with document OCR, text extraction, and audio transcription capabilities. Extract text from PDFs, images, Office documents, and transcribe audio files using advanced OCR and speech recognition technology with support for 60+ languages.
MCP Endpoint
https://mcp.ainoflow.io/mcp/v1/convertTransport:
Configuration
Add MCP server in the UI dialog:
Add to your Claude Desktop configuration:
Add to your Cursor MCP configuration:
Images (OCR)
PDF, JPEG, PNG, TIFF, BMP, WebP, GIF
Documents
Word (.doc, .docx), RTF, ODT, TXT
Spreadsheets
Excel (.xls, .xlsx), ODS
Presentations
PowerPoint (.ppt, .pptx), ODP
Audio (Transcription)
WAV, MP3, M4A, MP4, WebM, OGG, FLAC, AAC, Opus
Maximum file size: 100MB
Available Tools
Parameters
sourceUrl HTTP/HTTPS URL to download the file
languages Comma-separated ISO codes (default: "en"). Examples: "en", "en,de,fr", "zh-cn,ja"
outputs Output formats: "text", "pdf", or "text,pdf" (default: "text"). PDF not available for audio.
Example - Document OCR
Example - Audio Transcription
Available Resources
Returns: JSON with job status and pre-signed URLs
Returns: Extracted text directly (not URL)
Returns: Base64-encoded PDF blob (not URL)
languages"en" (English)outputs"text" (plain text)models"auto" (PaddleOCR/Tesseract/Whisper)For most use cases, only specify sourceUrl. Defaults handle English text extraction and audio transcription perfectly.
Ready to integrate Convert MCP?
Sign up for free and start using document conversion and audio transcription with your AI agents