Documentation

Getting Started

This section will give you a brief overview of the 2markdown API.

To get started with our API you need an API key. For authentication you need to provide the X-Api-Key header at all times. You can find your API key on the API keys page.

We provide a number of API endpoints to help you process data:

PDF To Markdown
URL to markdown
URL to markdown with loaded javascript
File to markdown
HTML to markdown

PDF To Markdown

The PDF to Markdown endpoint allows you to convert PDF documents to Markdown format. The service processes each page of the PDF and returns the extracted text content as well as any identified images/attachments. This is an asynchronous API - you'll receive a job ID immediately and can check the status until completion.

The endpoint is available at https://api.2markdown.com/v1/pdf2md and accepts POST requests with a multipart/form-data body containing the PDF file to be processed.

Cost: 5 credits per page processed.

Limitations:

Maximum file size: 200MB
Results are stored for 24 hours
Any extracted images/attachments are available for 24 hours

Initial Request

Parameter	Description
document	The PDF file to be processed (file upload)

Initial Response

Field	Description
jobId	Unique identifier for tracking the conversion job
status	Initial status (always "pending")
pages	Number of pages in the PDF

Status Check Endpoint

Check the status of your conversion job using a GET request to https://api.2markdown.com/v1/pdf2md/{jobId}

Processing Response

Field	Description
status	Current status ("processing", "completed", or "failed")
progress	Conversion progress as an integer percentage (0-100)
pages	Total number of pages in the PDF
checkIn	Recommended seconds to wait before next status check (varies based on PDF size and page count)

Completed Response

Field	Description
status	Status ("completed")
result	The converted markdown content
attachments	Array of extracted images/attachments (if any, available for 24 hours)
pages	Total number of pages processed

Examples:

curl --request POST \
  --url https://api.2markdown.com/v1/pdf2md \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-Api-Key: YOUR_API_KEY' \
  --form 'document=@/path/to/your/document.pdf'

import requests

url = "https://api.2markdown.com/v1/pdf2md"
headers = {
    "X-Api-Key": "YOUR_API_KEY"
}

files = {
    'document': ('document.pdf', open('path/to/your/document.pdf', 'rb'))
}

# Initial upload
response = requests.post(url, headers=headers, files=files)
job_id = response.json()['jobId']

# Check status
status_url = f"https://api.2markdown.com/v1/pdf2md/{job_id}"
status_response = requests.get(status_url, headers=headers)
print(status_response.json())

const form = new FormData();
form.append('document', document.querySelector('input[type="file"]').files[0]);

// Initial upload
const response = await fetch('https://api.2markdown.com/v1/pdf2md', {
    method: 'POST',
    headers: {
        'X-Api-Key': 'YOUR_API_KEY'
    },
    body: form
});
const { jobId } = await response.json();

// Check status
const statusResponse = await fetch(`https://api.2markdown.com/v1/pdf2md/${jobId}`, {
    headers: {
        'X-Api-Key': 'YOUR_API_KEY'
    }
});
const result = await statusResponse.json();
console.log(result);

URL To Markdown (supports JavaScript)

The URL to Markdown endpoint, recommended for current use, allows you to extract the essential content of a website as plaintext. This endpoint is an evolution of the previous /api/2md endpoint, offering enhanced functionality and support.

The endpoint is available at https://api.2markdown.com/v1/url2md and accepts POST requests with a JSON body containing the URL to be processed.

Cost: 1 credit per request.

Parameter	Description
url	The URL to be processed

Example:

curl --request POST \
  --url https://api.2markdown.com/v1/url2md \
  --header 'Content-Type: application/json' \
  --header 'X-Api-Key: YOUR_API_KEY' \
  --data '{"url": "https://www.retently.com"}'

import requests

url = "https://api.2markdown.com/v1/url2md"
headers = {
    "X-Api-Key": "YOUR_API_KEY",
    "Content-Type": "application/json"
}
data = {
    "url": "https://www.retently.com"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

const response = await fetch('https://api.2markdown.com/v1/url2md', {
    method: 'POST',
    headers: {
        'X-Api-Key': 'YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        url: 'https://www.retently.com'
    })
});
const result = await response.json();
console.log(result);

Note: While the /api/2md endpoint is still operational, we encourage developers to transition to the /api/v1/url2md endpoint to take advantage of its improved features and continued support.

URL To Markdown With Javascript Support (deprecated)

The URL to Markdown endpoint with javascript allows you to extract the essential content of a website as plaintext including any javascript-rendered resources. Note that this endpoint may have longer response times.

The endpoint is available at https://api.2markdown.com/v1/url2mdjs and accepts POST requests with a JSON body containing the URL to be processed.

Cost: 1 credit per request.

Parameter	Description
url	The URL to be processed

Example:

curl --request POST \
  --url https://api.2markdown.com/v1/url2mdjs \
  --header 'Content-Type: application/json' \
  --header 'X-Api-Key: YOUR_API_KEY' \
  --data '{"url": "https://www.retently.com"}'

import requests

url = "https://api.2markdown.com/v1/url2mdjs"
headers = {
    "X-Api-Key": "YOUR_API_KEY",
    "Content-Type": "application/json"
}
data = {
    "url": "https://www.retently.com"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

const response = await fetch('https://api.2markdown.com/v1/url2mdjs', {
    method: 'POST',
    headers: {
        'X-Api-Key': 'YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        url: 'https://www.retently.com'
    })
});
const result = await response.json();
console.log(result);

File To Markdown

The File to Markdown endpoint allows you to convert an uploaded HTML file to Markdown format.

The endpoint is available at https://api.2markdown.com/v1/file2md and accepts POST requests with a multipart/form-data body containing the HTML file to be processed.

Cost: 1 credit per request.

Request

Parameter	Description
document	The HTML file to be processed (file upload)

Headers

Header	Value
X-Api-Key	Your API key
Content-Type	multipart/form-data

Example using cURL:

curl --request POST \
  --url https://api.2markdown.com/v1/file2md \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-Api-Key: YOUR_API_KEY' \
  --form 'document=@/path/to/your/file.html'

import requests

url = "https://api.2markdown.com/v1/file2md"
headers = {
    "X-Api-Key": "YOUR_API_KEY"
}

files = {
    'document': ('page.html', open('path/to/your/page.html', 'rb'))
}

response = requests.post(url, headers=headers, files=files)
print(response.json())

const form = new FormData();
form.append('document', document.querySelector('input[type="file"]').files[0]);

const response = await fetch('https://api.2markdown.com/v1/file2md', {
    method: 'POST',
    headers: {
        'X-Api-Key': 'YOUR_API_KEY'
    },
    body: form
});
const result = await response.json();
console.log(result);

HTML To Markdown

The HTML to Markdown endpoint allows you to convert raw HTML content to Markdown format.

The endpoint is available at https://api.2markdown.com/v1/html2md and accepts POST requests with a JSON body containing the HTML content to be processed.

Cost: 1 credit per request.

Request

Parameter	Description
html	The HTML content to be converted to Markdown

Headers

Header	Value
X-Api-Key	Your API key
Content-Type	application/json

Example using cURL:

curl --request POST \
  --url https://api.2markdown.com/v1/html2md \
  --header 'Content-Type: application/json' \
  --header 'X-Api-Key: YOUR_API_KEY' \
  --data '{
    "html": "<h1>Hello, World!</h1><p>This is a test HTML content.</p>"
}'

import requests

url = "https://api.2markdown.com/v1/html2md"
headers = {
    "X-Api-Key": "YOUR_API_KEY",
    "Content-Type": "application/json"
}
data = {
    "html": "Hello, World!
This is a test HTML content."
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

const response = await fetch('https://api.2markdown.com/v1/html2md', {
    method: 'POST',
    headers: {
        'X-Api-Key': 'YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        html: 'Hello, World!
This is a test HTML content.'
    })
});
const result = await response.json();
console.log(result);

Note: The HTML content should be properly escaped in the JSON payload. The API will unescape the HTML before processing.

Responses

The response will be a JSON object with the following fields.

Response Fields

Field	Description
article	The essential content of the website as plaintext
error	In case of an error this field is present and gives an indication of what the problem is.

Response Status Codes

The following are status codes the server might answer with.

200 Success

400 Bad Request

Status Code	Description
200	The request is valid, counts towards your quota.
400	The request is invalid. Check the documentation above for the structure of valid requests Make sure the url is valid.
422	We couldn't extract enough content. Does not count towards your quota and is not billed.
500	Something went wrong. Please try again later.

Example:

{
  "article": "## Empower Your Knowledge,  Unclutter Your Inbox\n\n## ..."
}

Integrations

Using LangChain

You can use 2markdown with the LangChain document loader as follows. For more information, visit

First, install langchain with pip install langchain (more information here).

Then, you can use the following code to extract the article from a website:

from langchain.document_loaders import ToMarkdownLoader

# https://2markdown.com
md_api_key = "YOUR_KEY"

loader = ToMarkdownLoader(
    url="https://www.retently.com",
    api_key=md_api_key)

docs = loader.load()

# the loader returns an article containing the markdown
print(docs[0].page_content)

# output:
# ## Empower Your Knowledge,  Unclutter Your Inbox
# 
# ## Drowning in a Sea of Newsletters?
#
# Are you tired of wading through countless newsletters, struggling to find the
# information you need? Is your inbox cluttered with irrelevant content, making it
# difficult to stay up-to-date in today's fast-paced world? You're not alone. We
# know the feeling and that's why we created Retently.
# ...

For an example on how to use LangChain with 2markdown, have a look at our use case »Extracting Structured News Events With LangChain And OpenAI Functions«.

👉 You can find more information in the official LangChain documentation.

Using OpenAI Functions

This section will give you a brief overview of how to use 2markdown as a function with the OpenAI python library.

First, install the OpenAI python library with pip install openai as well as the requests library to call 2markdown pip install requests.

Let's do some preparations first:

# https://2markdown.com
md_api_key = "YOUR_KEY"
# https://openai.com
openai_api_key = "YOUR_KEY"

import openai
import json
import requests

openai.api_key = openai_api_key

Then, we can implement a function we'll later provide to OpenAI. It calls the 2markdown API with a provided URL:

# example: get_website_content("https://www.retently.com")
def get_website_content(url):
    res = requests.post("https://api.2markdown.com/v1/url2md", json={"url":url}, headers={"X-Api-Key":md_api_key})
    
    if res.status_code == 200:
        return res.json()['article']
    
    return None

Now, we can call the function with OpenAI:

# define the function
functions = [
    {
        "name": "get_website_content",
        "description": "Retrieve the content of any website in URL format",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The URL of the website, i.e. https://openai.com",
                },
            },
            "required": ["url"],
        },
    }
]

# construct the prompt
messages = [{"role": "user", "content": "What's https://www.retently.com about?"}]  # <- this is our prompt
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=messages,
    functions=functions,
    function_call="auto",  # auto is default, but we'll be explicit
)
response_message = response["choices"][0]["message"]

If OpenAI indeed decided to call the function, the response is quite meaningful at the moment. It just tells us that it would like to call the function

>>> print(response_message)
{
  "content": null,
  "function_call": {
    "arguments": "{\n  \"url\": \"https://www.retently.com\"\n}",
    "name": "get_website_content"
  },
  "role": "assistant"
}

In that case, let's fetch the actual response:

if response_message.get("function_call"):
    # call the function
    # Note: the JSON response may not always be valid; be sure to handle errors
    available_functions = {
        "get_website_content": get_website_content,
    }
    
    # only one function in this example, but you can have multiple
    function_name = response_message["function_call"]["name"]
    fuction_to_call = available_functions[function_name]
    function_args = json.loads(response_message["function_call"]["arguments"])
    function_response = fuction_to_call(
        url=function_args.get("url"),
    )

    # send the info on the function call and function response to GPT
    messages.append(response_message)  # extend conversation with assistant's reply
    messages.append(
        {
            "role": "function",
            "name": function_name,
            "content": function_response,
        }
    )
    
    # extend conversation with function response
    second_response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
    )
    
    print(second_response["choices"][0]["message"])

The response is now much more meaningful:

{
  "content": "Retently is a platform that aims to simplify the experience of managing and reading newsletters. It provides users with a dedicated email address for newsletter subscriptions and allows them to organize, customize, and schedule digests based on their topics and preferences. By uncluttering the inbox and streamlining the newsletter reading process, Retently enables users to stay up-to-date with industry insights, learn from experts, and become thought leaders in their fields.",
  "role": "assistant"
}

👉 You can find more information in the official OpenAI documentation about function calling with GPT. You'll also find the original code snippet.