Every website has metadata hidden in its HTML — titles, descriptions, Open Graph tags, favicons, Twitter cards, and more. Extracting this metadata is essential for building link previews, SEO analysis tools, competitive intelligence dashboards, and content aggregators.
In this guide, we’ll show you how to use the ToolCenter Metadata API to extract structured metadata from any URL, with practical examples and real-world use cases.
What Is Website Metadata?
Website metadata is information embedded in a page’s HTML <head> section that describes the page’s content. It includes:
Standard HTML Meta Tags
<title>Page Title</title>
<meta name="description" content="Page description for search engines">
<meta name="keywords" content="keyword1, keyword2">
<meta name="author" content="Author Name">
<meta name="robots" content="index, follow">
Open Graph Tags (Facebook/LinkedIn)
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Description for social sharing">
<meta property="og:image" content="https://example.com/image.jpg">
<meta property="og:url" content="https://example.com/page">
<meta property="og:type" content="article">
<meta property="og:site_name" content="Example Site">
Twitter Card Tags
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Page Title">
<meta name="twitter:description" content="Description for Twitter">
<meta name="twitter:image" content="https://example.com/twitter-image.jpg">
<meta name="twitter:site" content="@username">
Other Metadata
<link rel="icon" href="/favicon.ico">
<link rel="canonical" href="https://example.com/page">
<meta name="theme-color" content="#ffffff">
<meta name="viewport" content="width=device-width, initial-scale=1">
Why Extract Metadata with an API?
The DIY Challenge
You could scrape metadata yourself with libraries like BeautifulSoup or Cheerio, but you’ll quickly run into problems:
- JavaScript-rendered pages — Many modern sites render metadata client-side; simple HTML parsing misses it
- Redirects and canonical URLs — Following redirect chains correctly is tricky
- Rate limiting and blocking — Sites block scrapers; APIs handle this with rotating proxies
- Character encoding — UTF-8, ISO-8859-1, and other encodings need proper handling
- Malformed HTML — Real-world HTML is messy; robust parsing is non-trivial
The API Advantage
The ToolCenter Metadata API handles all of these challenges:
- Renders JavaScript-heavy pages with a real browser
- Follows redirects and resolves canonical URLs
- Handles rate limiting and retries automatically
- Returns clean, structured JSON
- Extracts favicons, OG tags, Twitter cards, and more
Use Cases
1. Link Preview Generation
Build rich link previews like Slack, Discord, or iMessage:
2. SEO Analysis
Build an SEO audit tool that checks:
- Does the page have a title? Is it the right length (50-60 chars)?
- Is there a meta description? Is it 150-160 characters?
- Are OG tags properly configured?
- Is there a canonical URL?
- What’s the favicon?
3. Competitive Analysis
Monitor competitors’ pages for:
- Title and description changes (A/B testing detection)
- New OG images (marketing campaign tracking)
- Schema markup changes
- Technology stack detection
4. Content Aggregation
Build RSS-like feeds from websites that don’t offer RSS:
- Extract titles and descriptions from article pages
- Pull OG images for visual feeds
- Get author and publication dates
5. Bookmark Managers
Create rich bookmarks with automatically extracted metadata — title, description, favicon, and preview image.
ToolCenter Metadata API
API Endpoint
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to extract metadata from |
Response Format
{
"url": "https://example.com",
"canonical_url": "https://example.com/",
"title": "Example Domain",
"description": "This domain is for use in illustrative examples.",
"author": null,
"favicon": "https://example.com/favicon.ico",
"og": {
"title": "Example Domain",
"description": "This domain is for illustrative examples.",
"image": "https://example.com/og-image.jpg",
"url": "https://example.com",
"type": "website",
"site_name": "Example"
},
"twitter": {
"card": "summary_large_image",
"title": "Example Domain",
"description": "This domain is for illustrative examples.",
"image": "https://example.com/twitter-image.jpg",
"site": "@example"
},
"meta_tags": {
"viewport": "width=device-width, initial-scale=1",
"theme-color": "#ffffff",
"robots": "index, follow"
},
"links": {
"canonical": "https://example.com/",
"icon": "https://example.com/favicon.ico",
"apple-touch-icon": "https://example.com/apple-touch-icon.png"
}
}
Code Examples
cURL
curl -X POST "https://toolcenter.dev/api/v1/metadata" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://github.com"}' | jq .
Node.js — Link Preview Builder
const ToolCenter = require('devtoolbox-sdk');
const client = new ToolCenter('YOUR_API_KEY');
async function buildLinkPreview(url) {
const metadata = await client.metadata(url);
return {
title: metadata.og?.title || metadata.title || 'Untitled',
description: metadata.og?.description || metadata.description || '',
image: metadata.og?.image || metadata.twitter?.image || null,
favicon: metadata.favicon || null,
siteName: metadata.og?.site_name || new URL(url).hostname,
url: metadata.canonical_url || url,
};
}
// Generate link preview
const preview = await buildLinkPreview('https://github.com');
console.log(preview);
// {
// title: "GitHub: Let's build from here",
// description: "GitHub is where over 100 million developers...",
// image: "https://github.githubassets.com/images/modules/site/social-cards/...",
// favicon: "https://github.githubassets.com/favicons/favicon.svg",
// siteName: "GitHub",
// url: "https://github.com"
// }
Python — SEO Analyzer
from devtoolbox import ToolCenter
client = ToolCenter("YOUR_API_KEY")
def analyze_seo(url: str) -> dict:
"""Analyze a URL's SEO metadata and return a report."""
meta = client.metadata(url=url)
issues = []
score = 100
# Check title
title = meta.get("title", "")
if not title:
issues.append("❌ Missing page title")
score -= 20
elif len(title) > 60:
issues.append(f"⚠️ Title too long ({len(title)} chars, max 60)")
score -= 5
elif len(title) < 30:
issues.append(f"⚠️ Title too short ({len(title)} chars, min 30)")
score -= 5
# Check description
desc = meta.get("description", "")
if not desc:
issues.append("❌ Missing meta description")
score -= 15
elif len(desc) > 160:
issues.append(f"⚠️ Description too long ({len(desc)} chars, max 160)")
score -= 5
# Check OG tags
og = meta.get("og", {})
if not og.get("title"):
issues.append("⚠️ Missing og:title")
score -= 10
if not og.get("description"):
issues.append("⚠️ Missing og:description")
score -= 10
if not og.get("image"):
issues.append("❌ Missing og:image — social shares will look bad")
score -= 15
# Check Twitter card
twitter = meta.get("twitter", {})
if not twitter.get("card"):
issues.append("⚠️ Missing twitter:card")
score -= 5
# Check favicon
if not meta.get("favicon"):
issues.append("⚠️ Missing favicon")
score -= 5
return {
"url": url,
"score": max(0, score),
"title": title,
"description": desc,
"issues": issues,
"og_complete": bool(og.get("title") and og.get("description") and og.get("image")),
}
# Run analysis
report = analyze_seo("https://example.com")
print(f"SEO Score: {report['score']}/100")
for issue in report["issues"]:
print(f" {issue}")
PHP — Bookmark Manager
use ToolCenter\Client;
$client = new Client('YOUR_API_KEY');
function createBookmark(string $url) use ($client): array {
$metadata = $client->metadata($url);
return [
'url' => $url,
'title' => $metadata['og']['title'] ?? $metadata['title'] ?? 'Untitled',
'description' => $metadata['og']['description'] ?? $metadata['description'] ?? '',
'image' => $metadata['og']['image'] ?? $metadata['twitter']['image'] ?? null,
'favicon' => $metadata['favicon'] ?? null,
'site_name' => $metadata['og']['site_name'] ?? parse_url($url, PHP_URL_HOST),
'saved_at' => date('Y-m-d H:i:s'),
];
}
// Save a bookmark
$bookmark = createBookmark('https://github.com');
echo "Saved: {$bookmark['title']}\n";
// Store in database
// DB::table('bookmarks')->insert($bookmark);
Bulk Metadata Extraction
Extract metadata from multiple URLs at once:
const ToolCenter = require('devtoolbox-sdk');
const client = new ToolCenter('YOUR_API_KEY');
const urls = [
'https://github.com',
'https://stackoverflow.com',
'https://dev.to',
'https://hackernews.com',
];
const results = await client.bulk('metadata',
urls.map(url => ({ url }))
);
results.forEach(result => {
console.log(`${result.title} — ${result.og?.description || 'No description'}`);
});
Best Practices
1. Cache Results
Metadata doesn’t change often. Cache results for 24-48 hours to reduce API calls:
const cache = new Map();
async function getMetadata(url) {
const cached = cache.get(url);
if (cached && Date.now() - cached.timestamp < 86400000) {
return cached.data;
}
const data = await client.metadata(url);
cache.set(url, { data, timestamp: Date.now() });
return data;
}
2. Handle Missing Data Gracefully
Not all websites have complete metadata. Always provide fallbacks:
const title = metadata.og?.title || metadata.title || new URL(url).hostname;
const image = metadata.og?.image || metadata.twitter?.image || '/default-preview.png';
3. Validate URLs
Always validate and sanitize URLs before passing them to the API:
from urllib.parse import urlparse
def is_valid_url(url: str) -> bool:
try:
result = urlparse(url)
return all([result.scheme in ('http', 'https'), result.netloc])
except:
return False
4. Respect Rate Limits
Use bulk endpoints for multiple URLs instead of individual requests. This is faster and more efficient.
Pricing
Metadata extraction is included in all ToolCenter plans — no extra charge:
| Plan | Price | Monthly Requests |
|---|---|---|
| Free | $0 | 100 |
| Starter | $9/mo | 5,000 |
| Pro | $29/mo | 25,000 |
| Business | $79/mo | 100,000 |
Conclusion
Extracting website metadata is a fundamental building block for many developer tools — link previews, SEO analyzers, bookmark managers, and content aggregators. The ToolCenter Metadata API makes it simple with a single endpoint that returns clean, structured JSON.
Combined with ToolCenter’s screenshot, PDF, QR code, and OG image tools, you have everything you need to build powerful web automation workflows.
Check the API documentation for detailed endpoint references and response schemas.