Extract Website Metadata with an API: Title, Description, OG Tags

Every website has metadata hidden in its HTML — titles, descriptions, Open Graph tags, favicons, Twitter cards, and more. Extracting this metadata is essential for building link previews, SEO analysis tools, competitive intelligence dashboards, and content aggregators.

In this guide, we’ll show you how to use the ToolCenter Metadata API to extract structured metadata from any URL, with practical examples and real-world use cases.

What Is Website Metadata?

Website metadata is information embedded in a page’s HTML <head> section that describes the page’s content. It includes:

Standard HTML Meta Tags

<title>Page Title</title>
<meta name="description" content="Page description for search engines">
<meta name="keywords" content="keyword1, keyword2">
<meta name="author" content="Author Name">
<meta name="robots" content="index, follow">

Open Graph Tags (Facebook/LinkedIn)

<meta property="og:title" content="Page Title">
<meta property="og:description" content="Description for social sharing">
<meta property="og:image" content="https://example.com/image.jpg">
<meta property="og:url" content="https://example.com/page">
<meta property="og:type" content="article">
<meta property="og:site_name" content="Example Site">

Twitter Card Tags

<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Page Title">
<meta name="twitter:description" content="Description for Twitter">
<meta name="twitter:image" content="https://example.com/twitter-image.jpg">
<meta name="twitter:site" content="@username">

Other Metadata

<link rel="icon" href="/favicon.ico">
<link rel="canonical" href="https://example.com/page">
<meta name="theme-color" content="#ffffff">
<meta name="viewport" content="width=device-width, initial-scale=1">

Why Extract Metadata with an API?

The DIY Challenge

You could scrape metadata yourself with libraries like BeautifulSoup or Cheerio, but you’ll quickly run into problems:

JavaScript-rendered pages — Many modern sites render metadata client-side; simple HTML parsing misses it
Redirects and canonical URLs — Following redirect chains correctly is tricky
Rate limiting and blocking — Sites block scrapers; APIs handle this with rotating proxies
Character encoding — UTF-8, ISO-8859-1, and other encodings need proper handling
Malformed HTML — Real-world HTML is messy; robust parsing is non-trivial

The API Advantage

The ToolCenter Metadata API handles all of these challenges:

Renders JavaScript-heavy pages with a real browser
Follows redirects and resolves canonical URLs
Handles rate limiting and retries automatically
Returns clean, structured JSON
Extracts favicons, OG tags, Twitter cards, and more

Use Cases

1. Link Preview Generation

Build rich link previews like Slack, Discord, or iMessage:

2. SEO Analysis

Build an SEO audit tool that checks:

Does the page have a title? Is it the right length (50-60 chars)?
Is there a meta description? Is it 150-160 characters?
Are OG tags properly configured?
Is there a canonical URL?
What’s the favicon?

3. Competitive Analysis

Monitor competitors’ pages for:

Title and description changes (A/B testing detection)
New OG images (marketing campaign tracking)
Schema markup changes
Technology stack detection

4. Content Aggregation

Build RSS-like feeds from websites that don’t offer RSS:

Extract titles and descriptions from article pages
Pull OG images for visual feeds
Get author and publication dates

5. Bookmark Managers

Create rich bookmarks with automatically extracted metadata — title, description, favicon, and preview image.

ToolCenter Metadata API

API Endpoint

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	URL to extract metadata from

Response Format

{
  "url": "https://example.com",
  "canonical_url": "https://example.com/",
  "title": "Example Domain",
  "description": "This domain is for use in illustrative examples.",
  "author": null,
  "favicon": "https://example.com/favicon.ico",
  "og": {
    "title": "Example Domain",
    "description": "This domain is for illustrative examples.",
    "image": "https://example.com/og-image.jpg",
    "url": "https://example.com",
    "type": "website",
    "site_name": "Example"
  },
  "twitter": {
    "card": "summary_large_image",
    "title": "Example Domain",
    "description": "This domain is for illustrative examples.",
    "image": "https://example.com/twitter-image.jpg",
    "site": "@example"
  },
  "meta_tags": {
    "viewport": "width=device-width, initial-scale=1",
    "theme-color": "#ffffff",
    "robots": "index, follow"
  },
  "links": {
    "canonical": "https://example.com/",
    "icon": "https://example.com/favicon.ico",
    "apple-touch-icon": "https://example.com/apple-touch-icon.png"
  }
}

Code Examples

cURL

curl -X POST "https://toolcenter.dev/api/v1/metadata" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com"}' | jq .

Node.js — Link Preview Builder

const ToolCenter = require('devtoolbox-sdk');
const client = new ToolCenter('YOUR_API_KEY');

async function buildLinkPreview(url) {
  const metadata = await client.metadata(url);
  
  return {
    title: metadata.og?.title || metadata.title || 'Untitled',
    description: metadata.og?.description || metadata.description || '',
    image: metadata.og?.image || metadata.twitter?.image || null,
    favicon: metadata.favicon || null,
    siteName: metadata.og?.site_name || new URL(url).hostname,
    url: metadata.canonical_url || url,
  };
}

// Generate link preview
const preview = await buildLinkPreview('https://github.com');
console.log(preview);
// {
//   title: "GitHub: Let's build from here",
//   description: "GitHub is where over 100 million developers...",
//   image: "https://github.githubassets.com/images/modules/site/social-cards/...",
//   favicon: "https://github.githubassets.com/favicons/favicon.svg",
//   siteName: "GitHub",
//   url: "https://github.com"
// }

Python — SEO Analyzer

from devtoolbox import ToolCenter

client = ToolCenter("YOUR_API_KEY")

def analyze_seo(url: str) -> dict:
    """Analyze a URL's SEO metadata and return a report."""
    meta = client.metadata(url=url)
    
    issues = []
    score = 100
    
    # Check title
    title = meta.get("title", "")
    if not title:
        issues.append("❌ Missing page title")
        score -= 20
    elif len(title) > 60:
        issues.append(f"⚠️ Title too long ({len(title)} chars, max 60)")
        score -= 5
    elif len(title) < 30:
        issues.append(f"⚠️ Title too short ({len(title)} chars, min 30)")
        score -= 5
    
    # Check description
    desc = meta.get("description", "")
    if not desc:
        issues.append("❌ Missing meta description")
        score -= 15
    elif len(desc) > 160:
        issues.append(f"⚠️ Description too long ({len(desc)} chars, max 160)")
        score -= 5
    
    # Check OG tags
    og = meta.get("og", {})
    if not og.get("title"):
        issues.append("⚠️ Missing og:title")
        score -= 10
    if not og.get("description"):
        issues.append("⚠️ Missing og:description")
        score -= 10
    if not og.get("image"):
        issues.append("❌ Missing og:image — social shares will look bad")
        score -= 15
    
    # Check Twitter card
    twitter = meta.get("twitter", {})
    if not twitter.get("card"):
        issues.append("⚠️ Missing twitter:card")
        score -= 5
    
    # Check favicon
    if not meta.get("favicon"):
        issues.append("⚠️ Missing favicon")
        score -= 5
    
    return {
        "url": url,
        "score": max(0, score),
        "title": title,
        "description": desc,
        "issues": issues,
        "og_complete": bool(og.get("title") and og.get("description") and og.get("image")),
    }

# Run analysis
report = analyze_seo("https://example.com")
print(f"SEO Score: {report['score']}/100")
for issue in report["issues"]:
    print(f"  {issue}")

PHP — Bookmark Manager

use ToolCenter\Client;

$client = new Client('YOUR_API_KEY');

function createBookmark(string $url) use ($client): array {
    $metadata = $client->metadata($url);
    
    return [
        'url' => $url,
        'title' => $metadata['og']['title'] ?? $metadata['title'] ?? 'Untitled',
        'description' => $metadata['og']['description'] ?? $metadata['description'] ?? '',
        'image' => $metadata['og']['image'] ?? $metadata['twitter']['image'] ?? null,
        'favicon' => $metadata['favicon'] ?? null,
        'site_name' => $metadata['og']['site_name'] ?? parse_url($url, PHP_URL_HOST),
        'saved_at' => date('Y-m-d H:i:s'),
    ];
}

// Save a bookmark
$bookmark = createBookmark('https://github.com');
echo "Saved: {$bookmark['title']}\n";

// Store in database
// DB::table('bookmarks')->insert($bookmark);

Bulk Metadata Extraction

Extract metadata from multiple URLs at once:

const ToolCenter = require('devtoolbox-sdk');
const client = new ToolCenter('YOUR_API_KEY');

const urls = [
  'https://github.com',
  'https://stackoverflow.com',
  'https://dev.to',
  'https://hackernews.com',
];

const results = await client.bulk('metadata', 
  urls.map(url => ({ url }))
);

results.forEach(result => {
  console.log(`${result.title} — ${result.og?.description || 'No description'}`);
});

Best Practices

1. Cache Results

Metadata doesn’t change often. Cache results for 24-48 hours to reduce API calls:

const cache = new Map();

async function getMetadata(url) {
  const cached = cache.get(url);
  if (cached && Date.now() - cached.timestamp < 86400000) {
    return cached.data;
  }

  const data = await client.metadata(url);
  cache.set(url, { data, timestamp: Date.now() });
  return data;
}

2. Handle Missing Data Gracefully

Not all websites have complete metadata. Always provide fallbacks:

const title = metadata.og?.title || metadata.title || new URL(url).hostname;
const image = metadata.og?.image || metadata.twitter?.image || '/default-preview.png';

3. Validate URLs

Always validate and sanitize URLs before passing them to the API:

from urllib.parse import urlparse

def is_valid_url(url: str) -> bool:
    try:
        result = urlparse(url)
        return all([result.scheme in ('http', 'https'), result.netloc])
    except:
        return False

4. Respect Rate Limits

Use bulk endpoints for multiple URLs instead of individual requests. This is faster and more efficient.

Pricing

Metadata extraction is included in all ToolCenter plans — no extra charge:

Plan	Price	Monthly Requests
Free	$0	100
Starter	$9/mo	5,000
Pro	$29/mo	25,000
Business	$79/mo	100,000

Conclusion

Extracting website metadata is a fundamental building block for many developer tools — link previews, SEO analyzers, bookmark managers, and content aggregators. The ToolCenter Metadata API makes it simple with a single endpoint that returns clean, structured JSON.

Combined with ToolCenter’s screenshot, PDF, QR code, and OG image tools, you have everything you need to build powerful web automation workflows.

Extract Metadata Now →

Check the API documentation for detailed endpoint references and response schemas.

What Is Website Metadata?#

Standard HTML Meta Tags#

Open Graph Tags (Facebook/LinkedIn)#

Twitter Card Tags#

Other Metadata#

Why Extract Metadata with an API?#

The DIY Challenge#

The API Advantage#

Use Cases#

1. Link Preview Generation#

2. SEO Analysis#

3. Competitive Analysis#

4. Content Aggregation#

5. Bookmark Managers#

ToolCenter Metadata API#

API Endpoint#

Parameters#

Response Format#

Code Examples#

cURL#

Node.js — Link Preview Builder#

Python — SEO Analyzer#

PHP — Bookmark Manager#

Bulk Metadata Extraction#

Best Practices#

1. Cache Results#

2. Handle Missing Data Gracefully#

3. Validate URLs#

4. Respect Rate Limits#

Pricing#

Conclusion#