Batch Processing
WorkflowEnrich up to 5,000 URLs in a single batch. Processing runs asynchronously, credits are only charged when you download results, and the download itself is a presigned S3 link that you can re-fetch as many times as you want.
How it works
The batch workflow has four logical steps. After creating a batch, processing happens asynchronously in the background — poll the status endpoint until each run completes, then download. The download step is two phases: a POST that charges credits and returns a presigned S3 URL, followed by a plain GET that streams the CSV from S3. Subsequent downloads of the same run are free and just return a fresh URL.
Supported output types
parent_companiescorporate_familiespe_ownershippe_portfoliosQuick reference
| Step | Method | Path | Purpose |
|---|---|---|---|
| 1 | POST | /v1/batch | Create a batch |
| 2 | POST | /v1/batch/{batch_id}/process | Process the batch |
| 3 | GET | /v1/batch/{batch_id}/runs | List runs for a batch (optional) |
| 4 | GET | /v1/batch/runs/{run_id} | Get run status |
| 5 | POST | /v1/batch/runs/{run_id}/download | Download run results |
Pricing per output type
Each output type follows the same pricing as its single-URL equivalent. Credits are only charged on download — not on create or process.
| Output type | Pricing rule |
|---|---|
parent_companies | 70 credits per URL |
pe_ownership | 70 credits per URL |
corporate_families | 20 credits per record returned (minimum 1 per URL) |
pe_portfolios | 20 credits per record returned (minimum 1 per URL) |
Steps in detail
Create a batch
POST/v1/batchSubmit a list of URLs to create a new batch. URLs are validated; any that can't be parsed are returned in invalid_urls but don't prevent batch creation. Save the returned batch_id — you'll need it for the next step.
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
urls | array of strings | Required | List of company URLs to enrich. Maximum 5,000 per batch. |
Example request
1{
2 "urls": [
3 "https://example.com",
4 "https://subsidiary.example.com",
5 "https://other.example.com"
6 ]
7}Example response
1{
2 "status": "Success",
3 "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
4 "total_url_count": 3,
5 "valid_url_count": 3,
6 "invalid_url_count": 0,
7 "invalid_urls": []
8}Code example
1curl -X POST https://api.magellandata.io/v1/batch \
2 -H "Authorization: Bearer YOUR_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "urls": [
6 "https://example.com",
7 "https://subsidiary.example.com",
8 "https://other.example.com"
9 ]
10 }'Process the batch
POST/v1/batch/{batch_id}/processKick off async processing for one or more output types. Each output type becomes a separate run with its own run_id and price. The call returns immediately — actual processing happens in the background.
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
output_types | array of strings | Required | One or more of: parent_companies, corporate_families, pe_ownership, pe_portfolios |
Example request
1{
2 "output_types": [
3 "parent_companies",
4 "corporate_families"
5 ]
6}Example response
1{
2 "status": "Success",
3 "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
4 "runs": [
5 {
6 "run_id": "184ec4d7-4cf0-4129-8a29-02bc78ba9045",
7 "output_type": "parent_companies",
8 "run_status": "processing",
9 "input_records": 3
10 },
11 {
12 "run_id": "33bbf2ff-d039-4bce-985c-3e7d3c99c64e",
13 "output_type": "corporate_families",
14 "run_status": "processing",
15 "input_records": 3
16 }
17 ],
18 "message": "Runs are processing asynchronously. Poll GET /batch/runs/{run_id} until run_status='completed', then POST /batch/runs/{run_id}/download."
19}Code example
1curl -X POST https://api.magellandata.io/v1/batch/{batch_id}/process \
2 -H "Authorization: Bearer YOUR_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "output_types": [
6 "parent_companies",
7 "corporate_families"
8 ]
9 }'List runs for a batch (optional)
GET/v1/batch/{batch_id}/runsReturn all runs ever created for a batch, with their current status and pricing. Useful for recovering run_ids if a client crashed, or for auditing what's been processed.
Request body
Authorization header.Example response
1{
2 "status": "Success",
3 "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
4 "runs": [
5 {
6 "run_id": "184ec4d7-4cf0-4129-8a29-02bc78ba9045",
7 "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
8 "output_type": "parent_companies",
9 "run_status": "completed",
10 "input_records": 3,
11 "output_count": 3,
12 "priced_records": 3,
13 "price_per_record": 70,
14 "price": 210,
15 "is_purchased": false
16 }
17 ]
18}Code example
1curl -X GET https://api.magellandata.io/v1/batch/{batch_id}/runs \
2 -H "Authorization: Bearer YOUR_API_KEY"Get run status
GET/v1/batch/runs/{run_id}Check the status of a specific run. Poll this until run_status is "completed" (or "failed"). Once completed, the price field tells you what download will cost.
Request body
Authorization header.Example response
1{
2 "status": "Success",
3 "run_id": "184ec4d7-4cf0-4129-8a29-02bc78ba9045",
4 "batch_id": "fa25641c-c650-4e3e-ba43-1455f730ba19",
5 "output_type": "parent_companies",
6 "run_status": "completed",
7 "input_records": 3,
8 "output_count": 3,
9 "priced_records": 3,
10 "price_per_record": 70,
11 "price": 210,
12 "is_purchased": false
13}Code example
1curl -X GET https://api.magellandata.io/v1/batch/runs/{run_id} \
2 -H "Authorization: Bearer YOUR_API_KEY"Download run results
POST/v1/batch/runs/{run_id}/downloadCharges credits on the first call per run, then returns a presigned S3 URL that points to the run's CSV. Follow the URL with a plain GET (no Authorization header — the URL is signed) to download the results. URLs expire after 1 hour, but you can call this endpoint again at any time to get a fresh URL without paying again — once a run is purchased, it's purchased forever.
Request body
Authorization header.Example response
1{
2 "status": "Success",
3 "run_id": "184ec4d7-4cf0-4129-8a29-02bc78ba9045",
4 "output_type": "parent_companies",
5 "output_count": 3,
6 "price": 210,
7 "download_url": "https://magellan-selfserve-requests.s3.amazonaws.com/batch_runs/.../parent_companies.csv?X-Amz-Algorithm=...&X-Amz-Signature=...",
8 "download_format": "csv",
9 "expires_in_seconds": 3600
10}Code example
1curl -X POST https://api.magellandata.io/v1/batch/runs/{run_id}/download \
2 -H "Authorization: Bearer YOUR_API_KEY"Error responses
400 Error
Invalid request — empty URLs list, unsupported output_type, or batch exceeds 5,000 URLs
402 Error
Insufficient credits (returned on download when balance is below the run's price)
403 Error
Forbidden — Invalid or missing API key
404 Error
Batch or run not found, or not owned by your account
409 Error
Run is not complete (returned when downloading a run still processing or failed)
Complete Python example
End-to-end script showing batch creation, processing, polling, and download with error handling.
1import requests
2import time
3
4API_KEY = "YOUR_API_KEY"
5BASE_URL = "https://api.magellandata.io/v1"
6HEADERS = {"Authorization": f"Bearer {API_KEY}"}
7
8urls = ["https://example.com", "https://subsidiary.example.com", "https://other.example.com"]
9
10# 1. Create the batch
11r = requests.post(f"{BASE_URL}/batch", headers=HEADERS, json={"urls": urls})
12r.raise_for_status()
13batch_id = r.json()["batch_id"]
14print(f"Batch created: {batch_id}")
15
16# 2. Kick off processing for the output types you want
17r = requests.post(
18 f"{BASE_URL}/batch/{batch_id}/process",
19 headers=HEADERS,
20 json={"output_types": ["parent_companies", "corporate_families"]},
21)
22r.raise_for_status()
23runs = r.json()["runs"]
24
25# 3. Poll each run until it completes
26completed = {}
27pending = {run["run_id"]: run["output_type"] for run in runs}
28while pending:
29 for run_id in list(pending.keys()):
30 r = requests.get(f"{BASE_URL}/batch/runs/{run_id}", headers=HEADERS)
31 r.raise_for_status()
32 status = r.json()
33 if status["run_status"] == "completed":
34 completed[run_id] = status
35 del pending[run_id]
36 print(f"{status['output_type']}: completed, {status['price']} credits")
37 elif status["run_status"] == "failed":
38 print(f"{pending[run_id]}: FAILED")
39 del pending[run_id]
40 if pending:
41 time.sleep(5)
42
43# 4. Download each completed run — two phases:
44# a) POST to charge credits and get a presigned S3 URL
45# b) GET the URL to stream the CSV to disk
46for run_id, status in completed.items():
47 output_type = status["output_type"]
48
49 # 4a. Purchase + URL
50 r = requests.post(f"{BASE_URL}/batch/runs/{run_id}/download", headers=HEADERS)
51 if r.status_code == 402:
52 err = r.json()
53 print(f"Insufficient credits: have {err['available_credits']}, need {err['required_credits']}")
54 continue
55 r.raise_for_status()
56 download_url = r.json()["download_url"]
57
58 # 4b. Stream the CSV from S3.
59 # Do NOT pass headers=HEADERS — the URL is self-signed and S3 will
60 # reject extra Authorization headers.
61 csv_path = f"{output_type}.csv"
62 with requests.get(download_url, stream=True) as resp:
63 resp.raise_for_status()
64 with open(csv_path, "wb") as f:
65 for chunk in resp.iter_content(chunk_size=1024 * 1024):
66 f.write(chunk)
67 print(f"{output_type}: saved to {csv_path}")