Back to all endpoints

Batch Processing

Workflow

Enrich up to 5,000 URLs in a single batch. Processing runs asynchronously, credits are only charged when you download results, and the download itself is a presigned S3 link that you can re-fetch as many times as you want.

credits per request

How it works

The batch workflow has four logical steps. After creating a batch, processing happens asynchronously in the background — poll the status endpoint until each run completes, then download. The download step is two phases: a POST that charges credits and returns a presigned S3 URL, followed by a plain GET that streams the CSV from S3. Subsequent downloads of the same run are free and just return a fresh URL.

Supported output types

parent_companiescorporate_familiespe_ownershippe_portfolios

Quick reference

StepMethodPathPurpose
1POST/v1/batchCreate a batch
2POST/v1/batch/{batch_id}/processProcess the batch
3GET/v1/batch/{batch_id}/runsList runs for a batch (optional)
4GET/v1/batch/runs/{run_id}Get run status
5POST/v1/batch/runs/{run_id}/downloadDownload run results

Pricing per output type

Each output type follows the same pricing as its single-URL equivalent. Credits are only charged on download — not on create or process.

Output typePricing rule
parent_companies70 credits per URL
pe_ownership70 credits per URL
corporate_families20 credits per record returned (minimum 1 per URL)
pe_portfolios20 credits per record returned (minimum 1 per URL)

Steps in detail

1

Create a batch

POST/v1/batch

Submit a list of URLs to create a new batch. URLs are validated; any that can't be parsed are returned in invalid_urls but don't prevent batch creation. Save the returned batch_id — you'll need it for the next step.

Request body

ParameterTypeRequiredDescription
urlsarray of stringsRequiredList of company URLs to enrich. Maximum 5,000 per batch.

Example request

1{
2  "urls": [
3    "https://example.com",
4    "https://subsidiary.example.com",
5    "https://other.example.com"
6  ]
7}

Example response

1{
2  "status": "Success",
3  "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
4  "total_url_count": 3,
5  "valid_url_count": 3,
6  "invalid_url_count": 0,
7  "invalid_urls": []
8}

Code example

1curl -X POST https://api.magellandata.io/v1/batch \
2  -H "Authorization: Bearer YOUR_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "urls": [
6      "https://example.com",
7      "https://subsidiary.example.com",
8      "https://other.example.com"
9    ]
10  }'
2

Process the batch

POST/v1/batch/{batch_id}/process

Kick off async processing for one or more output types. Each output type becomes a separate run with its own run_id and price. The call returns immediately — actual processing happens in the background.

Request body

ParameterTypeRequiredDescription
output_typesarray of stringsRequiredOne or more of: parent_companies, corporate_families, pe_ownership, pe_portfolios

Example request

1{
2  "output_types": [
3    "parent_companies",
4    "corporate_families"
5  ]
6}

Example response

1{
2  "status": "Success",
3  "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
4  "runs": [
5    {
6      "run_id": "184ec4d7-4cf0-4129-8a29-02bc78ba9045",
7      "output_type": "parent_companies",
8      "run_status": "processing",
9      "input_records": 3
10    },
11    {
12      "run_id": "33bbf2ff-d039-4bce-985c-3e7d3c99c64e",
13      "output_type": "corporate_families",
14      "run_status": "processing",
15      "input_records": 3
16    }
17  ],
18  "message": "Runs are processing asynchronously. Poll GET /batch/runs/{run_id} until run_status='completed', then POST /batch/runs/{run_id}/download."
19}

Code example

1curl -X POST https://api.magellandata.io/v1/batch/{batch_id}/process \
2  -H "Authorization: Bearer YOUR_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "output_types": [
6      "parent_companies",
7      "corporate_families"
8    ]
9  }'
Note: You can call process multiple times on the same batch with different output_types — each call creates a fresh set of runs with new run_ids. No credits are charged at this step.
3

List runs for a batch (optional)

GET/v1/batch/{batch_id}/runs

Return all runs ever created for a batch, with their current status and pricing. Useful for recovering run_ids if a client crashed, or for auditing what's been processed.

Request body

This endpoint takes no request body — just the Authorization header.

Example response

1{
2  "status": "Success",
3  "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
4  "runs": [
5    {
6      "run_id": "184ec4d7-4cf0-4129-8a29-02bc78ba9045",
7      "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
8      "output_type": "parent_companies",
9      "run_status": "completed",
10      "input_records": 3,
11      "output_count": 3,
12      "priced_records": 3,
13      "price_per_record": 70,
14      "price": 210,
15      "is_purchased": false
16    }
17  ]
18}

Code example

1curl -X GET https://api.magellandata.io/v1/batch/{batch_id}/runs \
2  -H "Authorization: Bearer YOUR_API_KEY"
4

Get run status

GET/v1/batch/runs/{run_id}

Check the status of a specific run. Poll this until run_status is "completed" (or "failed"). Once completed, the price field tells you what download will cost.

Request body

This endpoint takes no request body — just the Authorization header.

Example response

1{
2  "status": "Success",
3  "run_id": "184ec4d7-4cf0-4129-8a29-02bc78ba9045",
4  "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
5  "output_type": "parent_companies",
6  "run_status": "completed",
7  "input_records": 3,
8  "output_count": 3,
9  "priced_records": 3,
10  "price_per_record": 70,
11  "price": 210,
12  "is_purchased": false
13}

Code example

1curl -X GET https://api.magellandata.io/v1/batch/runs/{run_id} \
2  -H "Authorization: Bearer YOUR_API_KEY"
Note: Possible run_status values: processing (still running), completed (ready to download), failed (an error occurred — contact support if persistent).
5

Download run results

POST/v1/batch/runs/{run_id}/download

Charges credits on the first call per run, then returns a presigned S3 URL that points to the run's CSV. Follow the URL with a plain GET (no Authorization header — the URL is signed) to download the results. URLs expire after 1 hour, but you can call this endpoint again at any time to get a fresh URL without paying again — once a run is purchased, it's purchased forever.

Request body

This endpoint takes no request body — just the Authorization header.

Example response

1{
2  "status": "Success",
3  "run_id": "184ec4d7-4cf0-4129-8a29-02bc78ba9045",
4  "output_type": "parent_companies",
5  "output_count": 3,
6  "price": 210,
7  "download_url": "https://magellan-selfserve-requests.s3.amazonaws.com/batch_runs/.../parent_companies.csv?X-Amz-Algorithm=...&X-Amz-Signature=...",
8  "download_format": "csv",
9  "expires_in_seconds": 3600
10}

Code example

1curl -X POST https://api.magellandata.io/v1/batch/runs/{run_id}/download \
2  -H "Authorization: Bearer YOUR_API_KEY"
Note: POST rather than GET because the first call has a side effect (charging credits). After purchase, each call returns a fresh presigned URL — the data lives in S3 and you can re-download as many times as needed without paying again. To consume the results, follow download_url with a GET request (do not include your Authorization header — the URL carries its own signature, and S3 will reject extra auth headers). If a URL expires (after 1 hour), the S3 GET returns HTTP 403 with an XML body containing "Request has expired" — just POST to this endpoint again to get a fresh URL. The repurchase check is keyed on the run, not the URL, so this is always free after the first download.

Error responses

400 Error

Invalid request — empty URLs list, unsupported output_type, or batch exceeds 5,000 URLs

402 Error

Insufficient credits (returned on download when balance is below the run's price)

403 Error

Forbidden — Invalid or missing API key

404 Error

Batch or run not found, or not owned by your account

409 Error

Run is not complete (returned when downloading a run still processing or failed)

Complete Python example

End-to-end script showing batch creation, processing, polling, and download with error handling.

1import requests
2import time
3
4API_KEY = "YOUR_API_KEY"
5BASE_URL = "https://api.magellandata.io/v1"
6HEADERS = {"Authorization": f"Bearer {API_KEY}"}
7
8urls = ["https://example.com", "https://subsidiary.example.com", "https://other.example.com"]
9
10# 1. Create the batch
11r = requests.post(f"{BASE_URL}/batch", headers=HEADERS, json={"urls": urls})
12r.raise_for_status()
13batch_id = r.json()["batch_id"]
14print(f"Batch created: {batch_id}")
15
16# 2. Kick off processing for the output types you want
17r = requests.post(
18    f"{BASE_URL}/batch/{batch_id}/process",
19    headers=HEADERS,
20    json={"output_types": ["parent_companies", "corporate_families"]},
21)
22r.raise_for_status()
23runs = r.json()["runs"]
24
25# 3. Poll each run until it completes
26completed = {}
27pending = {run["run_id"]: run["output_type"] for run in runs}
28while pending:
29    for run_id in list(pending.keys()):
30        r = requests.get(f"{BASE_URL}/batch/runs/{run_id}", headers=HEADERS)
31        r.raise_for_status()
32        status = r.json()
33        if status["run_status"] == "completed":
34            completed[run_id] = status
35            del pending[run_id]
36            print(f"{status['output_type']}: completed, {status['price']} credits")
37        elif status["run_status"] == "failed":
38            print(f"{pending[run_id]}: FAILED")
39            del pending[run_id]
40    if pending:
41        time.sleep(5)
42
43# 4. Download each completed run — two phases:
44#    a) POST to charge credits and get a presigned S3 URL
45#    b) GET the URL to stream the CSV to disk
46for run_id, status in completed.items():
47    output_type = status["output_type"]
48
49    # 4a. Purchase + URL
50    r = requests.post(f"{BASE_URL}/batch/runs/{run_id}/download", headers=HEADERS)
51    if r.status_code == 402:
52        err = r.json()
53        print(f"Insufficient credits: have {err['available_credits']}, need {err['required_credits']}")
54        continue
55    r.raise_for_status()
56    download_url = r.json()["download_url"]
57
58    # 4b. Stream the CSV from S3.
59    #     Do NOT pass headers=HEADERS — the URL is self-signed and S3 will
60    #     reject extra Authorization headers.
61    csv_path = f"{output_type}.csv"
62    with requests.get(download_url, stream=True) as resp:
63        resp.raise_for_status()
64        with open(csv_path, "wb") as f:
65            for chunk in resp.iter_content(chunk_size=1024 * 1024):
66                f.write(chunk)
67    print(f"{output_type}: saved to {csv_path}")