khairold
← Back to Work

Singapore Legal SEO

Built a multi-country legal data platform — 18,000+ court judgments, 44,000+ entity profiles, AI-generated content — across 4 Commonwealth jurisdictions. Effectively $0/month.

February 2026
AstroCloudflare D1DrizzleWorkersAI

The Problem

Singapore’s court judgments are buried in eLitigation, a government system operated by CrimsonLogic for the Singapore Courts. It works, but it’s built for lawyers filing cases — not for browsing, researching, or discovering legal information casually.

The search is clunky. There’s no way to browse by court or topic. Individual judgments are walls of unformatted text. For law firms, legal researchers, and anyone trying to understand Singapore case law, the experience is friction from start to finish.

I saw an opportunity: take this public data, structure it properly, and build a fast, searchable, SEO-optimized site that makes Singapore legal information actually accessible.

The Approach

Phase 1 — Scraping the data

eLitigation’s listing endpoint is straightforward — paginated results filtered by court and year. I wrote scrapers to pull case metadata (name, citation, decision date, court, case numbers) and full judgment text from every available case.

The corpus: 10,470+ judgments spanning 2008–2026, across multiple courts — Supreme Court (SGHC), Court of Appeal (SGCA), Family Court (SGHCF, SGFC), and more. Each case averages 430–560 decisions per year, with structured HTML using CSS classes like Judg-1, Judg-2, Judg-Quote-0 that made parsing reliable.

Phase 2 — Processing and storage

Each scraped judgment goes through an AI processing pipeline:

  • Summarization — Claude generates a concise summary of the judgment
  • Catchword extraction — Key legal topics and areas of law
  • Categorization — Court type, area of law, outcome

Everything lands in Cloudflare D1 with full-text search powered by FTS5. Drizzle ORM handles the data layer — type-safe queries with zero runtime overhead.

Phase 3 — The site

Built a full SSR site on Astro 5 with Tailwind v4, deployed to Cloudflare Pages:

  • Homepage with aggregate stats (cases indexed, courts covered, year range)
  • Case listing with pagination (25 cases per page)
  • Case detail pages with formatted judgment text, AI summaries, catchwords, and Schema.org LegalCase structured data
  • Court index — browse all courts, click into filtered case lists per court
  • Catchword tag pages — browse cases by legal topic
  • Full-text search with FTS5 snippet highlighting

Every page is server-rendered on Cloudflare Workers, hitting D1 directly. No client-side JavaScript framework — pure Astro components. (React had a MessageChannel issue on Workers, so I went framework-free and never looked back.)

SEO fundamentals are baked in: dynamic meta tags, canonical URLs, auto-generated sitemap, robots.txt, and structured data on every case page.

Phase 4 — Entity extraction and enrichment

This is where the site went from a document repository to a legal intelligence platform. I built an extraction pipeline that pulls judges, lawyers, law firms, and parties from judgment text — then enriches them with external data.

For Singapore, that means cross-referencing with the LSRA (MinLaw’s lawyer registry), ACRA (company data), and the judiciary website for judge bios. The result: 4,460 entities with 15,937 attributes — admission years, qualifications, firm affiliations, court designations, Wikipedia bios.

Every entity gets its own profile page with case history, related entities, and AI-generated narrative content. Rankings pages surface the most active judges, firms, and lawyers by case volume. Practice area pages group cases by legal topic with aggregate stats.

Phase 5 — Multi-country expansion

The architecture was deliberately built to scale. One Drizzle schema, one shared Astro component library (@caselaw/core), one design system (@caselaw/ui) — but separate D1 databases and pipeline scripts per country.

Adding a new country means: write a scraper for the local data source, adapt the entity extractor for local naming conventions, plug it into the shared web layer. The site config is a single TypeScript file per country.

I expanded to three more Commonwealth jurisdictions:

  • Malaysia — 3,997 cases from eJudgment, PDF extraction for judgment text, enrichment from the Malaysian Bar Legal Directory (25,164 lawyers, 10,221 firms)
  • Hong Kong — 9,262 cases from HKLII, enrichment from HKBA (468 barristers matched with chambers, year of call, SC/JC status) and judiciary.hk
  • India — 4,003 cases across 3 courts (Supreme Court, Delhi HC, Bombay HC) from Indian Kanoon, enrichment from sci.gov.in and state HC websites

Each country has its own quirks — Malaysia serves judgments as PDFs with eFILING watermarks baked in, Hong Kong’s catchwords are actually parallel citations, India’s pagination caps at 400 results requiring monthly scrape windows. The pipeline handles all of it.

The Result

The platform now spans 4 countries with 18,000+ cases, 44,000+ entities, and 68,000+ pieces of AI-generated content. sgcaselaw.com is the flagship. The other three sites are deployed but deliberately noindexed — Singapore needs to prove the model first.

Every entity profile has a narrative bio, case history, Q&A content, and cross-links to related entities. Every case has an AI summary, extracted citations, statute references, and links to the judges, lawyers, and parties involved.

The programmatic SEO surface is massive — tens of thousands of unique pages, each targeting long-tail legal queries across four jurisdictions. The content is real (government court judgments), the structure is semantic, and the internal linking is systematic.

The entire thing runs on Cloudflare’s stack for effectively $0/month — Workers, D1, Pages, all free tier.

I wrote about the scaling journey — taking it from one country to four in a month, with 150+ autonomous agent sessions — in a separate blog post.

What I Learned

  • Cloudflare’s stack is absurdly good for this. Workers + D1 + Pages gives you a globally distributed, server-rendered site with a built-in database for effectively $0/month. The DX is excellent.
  • SSR on Workers beats static generation for large catalogs. With 10,000+ pages, static builds would take forever and redeploy on every data update. SSR means the site always reflects the latest data.
  • AI processing at scale is pipeline design, not prompt engineering. The prompts are simple. The hard part is building reliable scraping, deduplication, error handling, and incremental processing.
  • One schema, N databases is the right pattern. Forking code per country is a maintenance nightmare. Shared schema + shared web components + per-country config and pipelines scales cleanly.
  • Enrichment is the moat. Scraping cases is table stakes. The value comes from cross-referencing external data sources — bar associations, judiciary websites, legal directories — and building entity profiles that don’t exist anywhere else.
  • Skip React on Workers. The MessageChannel polyfill issue was a dead end. Pure Astro components render faster, ship less JavaScript, and work perfectly on the edge.