Blog Migration Plan: HubSpot → Brandfolder + Sanity

Last updated: 2026-04-21

Current State

File	Status
`scripts/hubspot-posts-cache.json`	✅ Done — ~1,200 posts cached
`scripts/hubspot-posts-content.json`	✅ Done — full HTML bodies
`scripts/hubspot-posts-enriched.json`	✅ Done — enriched metadata
`scripts/translation-groups-ai.json`	✅ Done — posts grouped by language
`scripts/hubspot-users.json`	✅ Done — authors/users
`scripts/hubspot-tags.json`	✅ Done — tags
`scripts/hubspot-blogs.json`	✅ Done — blog configs
`scripts/all-domains-assets.json`	✅ Done — asset inventory per domain
`scripts/hubspot-posts-blocks.json`	🔄 Partial — Claude block analysis (resumable)
`scripts/taxonomy-output/`	🔄 Partial — de-de run complete; remaining domains pending
`scripts/seo-scores.json`	🔄 Partial — en-us scored; re-run after each domain migrates
`scripts/seo-scores-bad.json`	🔄 Partial — failing posts below threshold (en-us)
`scripts/seo-scores.csv`	🔄 Partial — spreadsheet export (en-us)

Testing flag: all scripts accept --limit <n> to process only n items. Use this for spot-checks only — never for a full domain run.

Open Questions to Decide Before Starting

Slug strategy: Should the Sanity slug match the HubSpot URL path exactly (for SEO redirect mapping), or use the translated title?
legacyHtml field: Needs to be added to blogPost.ts schema — confirm before Phase 6.
Block types: Are the 7 block types (bodyText, bodyImage, bodyVideo, bodyBlockquote, comparisonTable, faqItem, howToModule) the full list, or are there HubSpot modules (sliders, CTAs, forms) that need new schema types before analysis runs?

Domain / Locale Reference

All per-domain runs use the full locale code from HubSpot (language field in hubspot-blogs.json).

Main blogs

Country	Domain code	HubSpot blog name	HubSpot domain
USA / English	`en-us`	The Cold Jet Blog	`blog.coldjet.com`
Germany	`de-de`	Cold Jet Germany Blog	`blog-de.coldjet.com`
France	`fr-fr`	Cold Jet France Blog	`blog-fr.coldjet.com`
Netherlands	`nl-nl`	Cold Jet Netherlands Blog	`blog-nl.coldjet.com`
Belgium (French)	`fr-be`	Cold Jet Belgium (French) Blog	`blog-fr-be.coldjet.com`
Mexico	`es-mx`	Cold Jet Mexico Blog	`blog-mx.coldjet.com`
China	`zh-cn`	Cold Jet China Blog	`blog-cn.coldjet.com`
Poland	`pl-pl`	Cold Jet Poland Blog	`blog-pl.coldjet.com`
Japan	`ja-jp`	Cold Jet Japan Blog	`blog-ja.coldjet.com`
Brazil	`pt-br`	Cold Jet Brazil Blog	`blog-pt-br.coldjet.com`
Belgium (Dutch)	`nl-be`	Cold Jet Belgium (Dutch) Blog	`blog-nl-be.coldjet.com`
Spain	`es-es`	Cold Jet Spain Blog	`blog-es.coldjet.com`
Portugal	`pt-pt`	Cold Jet Portugal Blog	(no domain set)

Full Sequence at a Glance

Phase 1 — Data (mostly already done)
  └── npm run enrich:posts                              # refresh if needed

Phase 2 — Excel taxonomy → JSON → Sanity
  └── Copy excels to scripts/excels/
  └── Copy & fill scripts/excel-convert-config.json from example
  └── npm install                                       # installs xlsx package
  └── npm run convert:taxonomy:excel -- --domain de-de # one domain
  └── npm run convert:taxonomy:excel                   # all domains
  └── Review scripts/taxonomy-output/merged/
  └── npm run push:taxonomy -- --dataset staging

Phase 3 — Asset dedup (manual)
  └── Edit scripts/all-domains-assets.json manually
  └── npm run check:dom-url

Phase 4 — Upload assets → Brandfolder
  └── npm run upload:brandfolder

Phase 5 — Block analysis (Claude AI)
  └── npm run analyze:post-blocks:claude -- --domain de-de   # one domain
  └── npm run analyze:post-blocks:claude                     # all domains

Phase 6 — Blog posts → Sanity (per domain)
  └── npm run push:posts:sanity -- --domain de-de --dataset staging
  └── [QA] spot-check in Sanity Studio
  └── npm run push:posts:sanity -- --domain de-de --dataset production
  └── Repeat: fr-fr → en-us → es-es → pl-pl → ...

Phase 7 — URL validation
  └── npm run check:dom-url

Phase 8 — QA
  └── npx tsc --noEmit && npm run lint
  └── Manual review in Sanity Studio

Phase 1 — Refresh & Verify Source Data

All data is already cached. Verify it is complete and up to date before any writes.

1.1 — Verify post count

# Prints the total number of cached posts to confirm the cache is not empty or truncated
# No output files — console only
node -e "const d=require('./scripts/hubspot-posts-cache.json'); console.log(d.length, 'posts')"

1.2 — Re-fetch if stale

# Fetches all blog configurations (names, IDs, domains) from HubSpot API
# Output → scripts/hubspot-blogs.json
npm run fetch:blogs

# Fetches all HubSpot portal users (name, email, userId)
# Output → scripts/hubspot-users.json
npm run fetch:users

# Fetches all HubSpot blog tags / taxonomy
# Output → scripts/hubspot-tags.json
npm run fetch:taxonomy

# Fetches the full HTML body of every blog post (paginated, concurrent)
# Output → scripts/hubspot-posts-content.json
npm run fetch:post-bodies

# Merges post metadata + bodies + blog info into one enriched dataset
# Reads  → hubspot-posts-cache.json + hubspot-posts-content.json + hubspot-blogs.json
# Output → scripts/hubspot-posts-enriched.json
npm run enrich:posts

1.3 — Confirm translation grouping

# Groups posts that are translations of each other using AI matching
# Reads  → scripts/hubspot-posts-enriched.json
# Output → scripts/translation-groups-ai.json  (array of groups, each group = same article in N languages)
#        → scripts/language-orphans-ai.json     (posts with no matched translation)
#        → scripts/coverage-matrix.csv          (language × domain coverage table)
npm run check:language-coverage:ai

1.4 — Inspect a single post

# Prints all fields of one post to the console — useful for debugging field mappings
# No output files — console only
npm run inspect:post -- --id <hubspot_post_id>

Phase 2 — Taxonomy: Excel → JSON → Sanity

There are 4 tag types + authors that come from Excel files (one per domain), not from HubSpot. They are richer and more categorised than anything in HubSpot.

Tag types (from Excel `Tags` sheet — `de-de` only, shared globally)

Tag type	Sanity doc type	Field on `blogPost`
Technology Tags	`technologyTag`	`technologyTags[]` → array of references
Product Model Tags	`productModelTag`	`productModelTags[]` → array of references
Industry Tags	`industryTag`	`industryTags[]` → array of references
Application Tags	`applicationTag`	`applicationTags[]` → array of references

Authors (from Excel `Authors - Person Schema` sheet — `de-de` only)

Extracted fields per author:

Excel column	Output key	Notes
`Keep as Author?`	`keepAsAuthor`	`Replace` or `Yes`
`Full Name (for Sanity)`	`name`	Used as the Sanity `person` display name + slug source
`Job Title`	`jobTitle`
`Bio (50-100 words)`	`bio`
`Headshot Available?`	`headshotAvailable`
`LinkedIn URL`	`linkedIn`
`Knows About (3-5 topics)`	`knowsAbout`	Split by comma into array
`Credentials`	`credentials`
`Education`	`education`

Template/instruction rows are automatically filtered out using skipIfColumnEmpty: "#" and excludeNames: ["Map to Created By"] in the config.

Plus the existing HubSpot sources:

Type	Sanity doc type	Source
Blog categories	`category`	`hubspot-blogs.json` + `hubspot-tags.json`

Step 2.0 — Install dependencies

# Installs all npm packages including xlsx (SheetJS) which is needed to read .xlsx files
# No output files
npm install

Step 2.1 — Place Excel files

One Excel file per domain. The config (excel-convert-config.json) already covers the first 5 — add an entry for each additional domain you have an Excel file for:

scripts/excels/
  blogs-de-de.xlsx       ← Tags sheet + Authors sheet + Blog Posts sheet
  blogs-fr-fr.xlsx       ← Blog Posts sheet only
  blogs-en-us.xlsx
  blogs-es-es.xlsx
  blogs-pl-pl.xlsx
  blogs-es-mx.xlsx       ← add to config if available
  blogs-fr-be.xlsx
  blogs-nl-nl.xlsx
  blogs-nl-be.xlsx
  blogs-pt-br.xlsx
  blogs-ja-jp.xlsx
  blogs-pt-pt.xlsx

Step 2.2 — Create the config file

# Copies the example config — sheet names and column headers are already filled in
# Output → scripts/excel-convert-config.json  (ready to use, just add your domain file paths)
cp scripts/excel-convert-config.example.json scripts/excel-convert-config.json

The config is pre-filled with the real sheet and column structure:

Sheet: Tags — read from de-de only (identical across all files)

Excel column	Output key	Tag type
`industries-tags`	`name`	industryTags
`Technology Tags`	`name`	technologyTags
`Product Model Tags`	`name`	productModelTags
`Application Tags`	`name`	applicationTags

Each column is extracted as a separate outputType — 4 passes over the same sheet, one per tag type.

Sheet: Authors - Person Schema — read from de-de only, outputs authors in all-tags.json

Excel column	Output key
`Keep as Author?`	`keepAsAuthor`
`Full Name (for Sanity)`	`name` (also generates `slug`)
`Job Title`	`jobTitle`
`Bio (50-100 words)`	`bio`
`Headshot Available?`	`headshotAvailable`
`LinkedIn URL`	`linkedIn`
`Knows About (3-5 topics)`	`knowsAbout` (array)
`Credentials`	`credentials`
`Education`	`education`

Sheet: Blog Posts — read from every domain file

Excel column	Output key
`URL`	`url` (post identifier)
`Technology Tags`	`technologyTags`
`Product Model Tags`	`productModelTags`
`Industry Tags`	`industryTags`
`Application Tags`	`applicationTags`

Tag columns contain comma-separated tag names. push-taxonomy-to-sanity.ts splits them and resolves each name to a Sanity document ID.

Only the de-de entry includes the Tags sheet. All other domains only include Blog posts — the tag lists are shared globally and deduplication in merged/ handles any overlap.

Discovery mode — verify sheet names without writing anything

# Opens the Excel file and prints every sheet name + its column headers, then exits
# No output files — console only
npm run convert:taxonomy:excel -- --file scripts/excels/blogs-de-de.xlsx

Step 2.3 — Run the converter

# Converts a single domain's Excel file to JSON
# Reads  → scripts/excels/blogs-de-de.xlsx  (via config)
# Output → scripts/taxonomy-output/technologyTags-de-de.json
#        → scripts/taxonomy-output/productModelTags-de-de.json
#        → scripts/taxonomy-output/industryTags-de-de.json
#        → scripts/taxonomy-output/applicationTags-de-de.json
#        → scripts/taxonomy-output/posts-de-de.json  (post → tag mapping rows)
npm run convert:taxonomy:excel -- --domain de-de

# Converts all domains defined in the config, then deduplicates across all languages
# Reads  → all Excel files listed in scripts/excel-convert-config.json
# Output → scripts/taxonomy-output/<type>-<domain>.json  (one per sheet per domain)
#        → scripts/taxonomy-output/merged/technologyTags.json   (deduplicated — seed into Sanity)
#        → scripts/taxonomy-output/merged/productModelTags.json
#        → scripts/taxonomy-output/merged/industryTags.json
#        → scripts/taxonomy-output/merged/applicationTags.json
npm run convert:taxonomy:excel

# Same as above but also writes .xlsx files alongside the JSON (optional review format)
# Output → same as above + scripts/taxonomy-output/<type>-<domain>.xlsx
npm run convert:taxonomy:excel -- --format excel

Output structure

scripts/taxonomy-output/
  all-tags.json           ← single object with all 5 types (seed into Sanity)
  posts-de-de.json        ← one row per post: url + 4 tag columns (arrays of tag names)
  posts-fr-fr.json
  posts-en-us.json
  ...

all-tags.json structure:

{
  "industryTags":     [ { "name": "Automotive", "slug": "automotive" }, ... ],
  "technologyTags":   [ { "name": "CO2 Cleaning", "slug": "co2-cleaning" }, ... ],
  "productModelTags": [ { "name": "i3 MicroClean", "slug": "i3-microclean" }, ... ],
  "applicationTags":  [ { "name": "Surface Cleaning", "slug": "surface-cleaning" }, ... ],
  "authors": [
    {
      "keepAsAuthor": "Replace",
      "name": "Matt Caminiti",
      "jobTitle": "Director, Corporate Marketing Communications & Strategy",
      "bio": "",
      "headshotAvailable": "",
      "linkedIn": "https://www.linkedin.com/in/matt-caminiti/",
      "knowsAbout": [],
      "credentials": "",
      "education": "",
      "slug": "matt-caminiti",
      "language": "de-de"
    }
  ]
}

Each post row:

{
  "domain": "de-de",
  "url": "https://www.coldjet.com/de/blog/article-slug/",
  "technologyTags": ["CO2 Cleaning", "Dry Ice"],
  "productModelTags": ["i3 MicroClean"],
  "industryTags": ["Automotive", "Aerospace"],
  "applicationTags": ["Surface Cleaning"]
}

Step 2.4 — Sanity schema changes required

New tag schema files (shape: name, slug, description):

src/sanity/schemaTypes/technologyTag.ts
src/sanity/schemaTypes/productModelTag.ts
src/sanity/schemaTypes/industryTag.ts
src/sanity/schemaTypes/applicationTag.ts

person.ts — ensure it has fields matching the Excel author columns:

name, slug, jobTitle, bio, headshotAvailable, linkedIn,
knowsAbout (array of string), credentials, education, language

blogPost.ts — add a taxonomy group with 4 new reference array fields:

technologyTags   → array of reference → technologyTag
productModelTags → array of reference → productModelTag
industryTags     → array of reference → industryTag
applicationTags  → array of reference → applicationTag

blogPost.ts — add legacyHtml field for rollback safety:

{ name: 'legacyHtml', type: 'text', readOnly: true, hidden: true }

index.ts — register all 4 new types.

Step 2.5 — Seed taxonomy into Sanity

# Creates category, person, and all 4 tag type documents in the Sanity staging dataset
# Reads  → scripts/taxonomy-output/merged/technologyTags.json
#        → scripts/taxonomy-output/merged/productModelTags.json
#        → scripts/taxonomy-output/merged/industryTags.json
#        → scripts/taxonomy-output/merged/applicationTags.json
#        → scripts/hubspot-tags.json   (→ category documents)
#        → scripts/hubspot-users.json  (→ person documents)
# Output → scripts/sanity-id-map.json  (maps every hubspot/slug id → sanity document _id)
npm run push:taxonomy -- --dataset staging

# After reviewing staging in Sanity Studio — promote to production
# Reads  → same files as above
# Output → updates scripts/sanity-id-map.json with production _ids
npm run push:taxonomy -- --dataset production

scripts/sanity-id-map.json structure:

{
  "categories":       { "<hubspot_tag_id>":  "<sanity_doc_id>" },
  "persons":          { "<hubspot_user_id>": "<sanity_doc_id>" },
  "technologyTags":   { "<slug>": "<sanity_doc_id>" },
  "productModelTags": { "<slug>": "<sanity_doc_id>" },
  "industryTags":     { "<slug>": "<sanity_doc_id>" },
  "applicationTags":  { "<slug>": "<sanity_doc_id>" }
}

Step 2.6 — Build the post → tag mapping

The posts-<domain>.json files contain which tag names belong to which post. The taxonomy push script consolidates these (using the sanity-id-map.json to resolve names → Sanity IDs) into:

# Produced automatically at the end of npm run push:taxonomy
# Reads  → scripts/taxonomy-output/posts-*.json  (all domains)
#        → scripts/sanity-id-map.json
# Output → scripts/post-tag-map.json

scripts/post-tag-map.json structure:

{
  "<hubspot_post_id>": {
    "technologyTags":   ["<sanity_id>", ...],
    "productModelTags": ["<sanity_id>", ...],
    "industryTags":     ["<sanity_id>", ...],
    "applicationTags":  ["<sanity_id>", ...]
  }
}

Phase 3 — Asset Extraction & Deduplication

scripts/all-domains-assets.json already exists — every image, video, and document URL found in blog content, grouped by domain.

3.1 — Review duplicates manually

# Prints the list of domain keys inside the asset file — shows which domains have assets
# No output files — console only
node -e "const d=require('./scripts/all-domains-assets.json'); console.log(Object.keys(d))"

Then open scripts/all-domains-assets.json and for each duplicate entry add:

"dedupOf": "<canonical_url>" — points to the authoritative version
"skip": true — for assets that should not be migrated at all

Brandfolder detects binary duplicates on upload automatically. Keep the admin view open at Settings → General Settings → Advanced → Manage Deleted Assets to catch and resolve those.

3.2 — Validate URLs before uploading

# Sends a HEAD request to every asset URL and records the HTTP status
# Reads  → scripts/all-domains-assets.json
# Output → scripts/check-valid-all.json   (url + status + response time per asset)
#        → scripts/check-valid-all.csv    (same, spreadsheet-friendly)
npm run check:dom-url

Only upload assets with status 200. Fix or skip anything else before Phase 4.

Phase 4 — Upload Assets to Brandfolder

# Uploads every non-skipped, validated asset to Brandfolder
# Reads  → scripts/all-domains-assets.json   (asset list with dedup markers)
#        → scripts/check-valid-all.json       (skip anything not 200)
# Output → scripts/brandfolder-url-map.json  (old hubspot url → new brandfolder CDN url)
#        → Sanity migrationAssetLog documents (one per uploaded asset: sourceUrl, brandfolderId, CDN url, uploadedAt)
npm run upload:brandfolder

scripts/brandfolder-url-map.json is the critical handoff file to Phase 6. Every HubSpot CDN link in post content will be replaced using this map.

Phase 5 — Content Block Analysis (Claude AI)

Transform raw HubSpot HTML into typed Sanity bodySections blocks using the Anthropic Batches API.

# Submits all posts for the given domain to Claude via the Batches API, then polls for results
# Reads  → scripts/hubspot-posts-enriched.json   (post metadata + domain info)
#        → scripts/hubspot-posts-content.json     (raw HTML bodies)
#        → scripts/post-blocks-cache.json         (resume state — skip already-processed posts)
# Output → scripts/hubspot-posts-blocks.json      (structured bodySections array per post ID)
#        → scripts/post-blocks-cache.json         (updated cache — safe to resume from here)
#        → scripts/post-blocks-batches.json       (Anthropic batch IDs and status)
#        → scripts/post-blocks-errors.json        (posts Claude could not parse — fix manually)
npm run analyze:post-blocks:claude -- --domain de-de

# Same but processes all domains in sequence
# Output → same files as above, populated for all posts across all domains
npm run analyze:post-blocks:claude

The script is fully resumable — if it is stopped mid-run, re-running picks up from the cache.

Block types produced

Block type	Source trigger
`bodyText`	Every H2/H3 section and surrounding paragraphs
`bodyImage`	`<img>` tags extracted from content flow
`bodyVideo`	YouTube / Vimeo / Wistia iframes
`bodyBlockquote`	`<blockquote>` elements
`comparisonTable`	`<table>` elements
`faqItem`	FAQ patterns detected by Claude
`howToModule`	Numbered step guides

Phase 6 — Sanity Blog Post Migration (Per Domain)

Action: Write `scripts/push-posts-to-sanity.ts` (new script)

Accepts --domain and --dataset flags. For each post in the target domain:

Read translation-groups-ai.json — find all posts for that domain
Map HubSpot fields → Sanity blogPost fields
Resolve author / category references from sanity-id-map.json
Resolve the 4 tag reference arrays from post-tag-map.json
Swap asset URLs using brandfolder-url-map.json
Write bodySections from hubspot-posts-blocks.json
Write raw HTML to legacyHtml field
Create the document with _type: 'blogPost'
Output scripts/sanity-migration-log-<domain>.json for audit

# Creates all blog post documents for de-de in the staging dataset
# Reads  → scripts/translation-groups-ai.json   (which posts belong to de-de)
#        → scripts/hubspot-posts-enriched.json   (fields: title, slug, publishedAt, author...)
#        → scripts/hubspot-posts-blocks.json     (bodySections blocks from Phase 5)
#        → scripts/hubspot-posts-content.json    (raw HTML for legacyHtml field)
#        → scripts/sanity-id-map.json            (resolve author, category, tag refs)
#        → scripts/post-tag-map.json             (resolve 4 tag arrays)
#        → scripts/brandfolder-url-map.json      (swap image URLs)
# Output → Sanity blogPost documents in the staging dataset
#        → scripts/sanity-migration-log-de-de.json  (log of every created/skipped/failed doc)
npm run push:posts:sanity -- --domain de-de --dataset staging

# After QA passes in Sanity Studio — promote to production
# Reads  → same files as above
# Output → Sanity blogPost documents in the production dataset
#        → scripts/sanity-migration-log-de-de.json  (updated with production doc _ids)
npm run push:posts:sanity -- --domain de-de --dataset production

Migration order

de-de → fr-fr → en-us → es-es → pl-pl → es-mx → fr-be → nl-nl → nl-be → pt-br → ja-jp → zh-cn → pt-pt

Start with de-de — one of the largest domains — so issues surface early. Academy blogs (en, pl, de, fr, ja) should be confirmed in scope separately before migrating.

Per-domain checklist

All posts created in Sanity staging dataset
Spot-check 5 posts manually in Sanity Studio
All 4 tag reference arrays populated on each post
All Brandfolder image URLs resolve in browser
legacyHtml field populated on each document
Author and category references resolve correctly
Run npm run check:dom-url on a sample of migrated Brandfolder URLs
Promote to production only after all checks pass

Phase 7 — URL Validation Pass

# Re-validates all URLs that appear in migrated content — catches any Brandfolder CDN issues
# Reads  → scripts/brandfolder-url-map.json  (all new CDN URLs to check)
# Output → scripts/check-valid-all.json       (updated — any non-200 URLs flagged for manual fix)
#        → scripts/check-valid-all.csv
npm run check:dom-url

Use scripts/brandfolder-url-map.json to find any remaining HubSpot CDN URLs not yet replaced and fix them before the next domain runs.

Phase 8 — QA & Finalisation

# Checks TypeScript types across the entire project — must pass with zero errors
# No output files — exits non-zero if there are type errors
npx tsc --noEmit

# Runs ESLint across the project — must pass with zero errors before committing
# No output files — exits non-zero on lint errors
npm run lint

Manual checks per domain batch

Browse 10 random posts in Sanity Studio — are bodySections blocks correct?
Are all images displaying via Brandfolder CDN?
Are all 4 tag types populated correctly on each post?
Are author bylines and categories populated with references (not raw strings)?
Does the slug match the original HubSpot URL path (for redirect mapping)?
If a post had a video — does the bodyVideo block have the correct embed URL?
Check scripts/post-blocks-errors.json — manually fix any posts Claude could not parse

Phase 9 — SEO Scoring & AI Content Enhancement (Post-Migration)

These two scripts run after migration is complete and posts are live in Sanity. They are optional but strongly recommended before launching each country.

9.1 — SEO Scoring Script

Script: scripts/score-seo.ts ✅ Written Command: npm run score:seo

Scores every post against 11 SEO best-practice rules and writes the results to JSON/CSV. No content is changed — read-only analysis. Max score is 90 pts.

Scoring rules (11 rules, max 90 pts)

Rule	Check	Points
Title length	Between 50–60 characters	10
Title has primary keyword	Keyword (derived from slug) appears in title	10
Meta description length	Between 120–160 characters	10
H1 present	At least one `<h1>` in body HTML	5
H2/H3 structure	At least 2 section headings in body	10
Word count	Body content ≥ 300 words	10
Image alt text	All `<img>` tags in body have non-empty alt text	10
Featured image	`featuredImage` field is populated	5
Keyword density	Primary keyword appears 1–3% of body word count	10
Slug quality	Lowercase, hyphens only, ≤ 75 chars	5
Internal links	At least 1 internal link (coldjet.com or relative) in body	5

Usage

# Score all posts across all domains
# Reads  → scripts/hubspot-posts-enriched.json
#        → scripts/hubspot-posts-content.json
# Output → scripts/seo-scores.json   (one entry per post with score + per-rule breakdown)
#        → scripts/seo-scores.csv    (same, spreadsheet-friendly)
npm run score:seo

# Score one domain only
npm run score:seo -- --domain de-de

# Show only posts below threshold + print worst 10 to console (default threshold: 60)
# Output → scripts/seo-scores-bad.json  (failing posts sorted worst-first)
npm run score:seo -- --domain de-de --threshold 60 --bad-only

Output structure

scripts/seo-scores.json — array of post results:

[
  {
    "postId": "7359675231",
    "domain": "en-us",
    "title": "3 ways dry ice blasting is used in the automotive industry",
    "slug": "3-ways-dry-ice-blasting-is-used-in-the-automotive-industry",
    "score": 65,
    "passed": 8,
    "failed": 3,
    "rules": {
      "titleLength":    { "pass": false, "note": "57 chars — ideal (50–60)" },
      "wordCount":      { "pass": false, "note": "218 words — below 300 minimum" },
      "imageAltText":   { "pass": false, "note": "2 of 4 images missing alt text" },
      "internalLinks":  { "pass": true,  "note": "3 internal links found" }
    }
  }
]

scripts/seo-scores-bad.json — only failing posts (score < threshold), sorted worst-first. Same shape as above.

9.2 — AI Content Enhancement Script

Script to write: scripts/enhance-content-claude.ts Command to add: npm run enhance:content

Uses Claude to improve post content based on the SEO score results. This script is gated behind an explicit flag — it will refuse to run unless --ai-enhance is passed. This prevents accidental bulk rewrites.

Enhanced posts are flagged in both the JSON output and in Sanity so editors know to review them before publishing.

What Claude enhances (per failing rule in seo-scores.json)

Failing rule	Enhancement
`titleLength`	Rewrites title to fit 50–60 chars while keeping meaning
`metaLength`	Rewrites meta description to hit 120–160 chars
`imageAltText`	Generates descriptive alt text from image URL + post context
`keywordDensity`	Adjusts keyword placement in `bodyText` blocks
`wordCount`	Flags post as too short — suggests expansion topics (does not auto-expand)

Claude never auto-expands short posts — those require human judgement. It only rewrites fields where the fix is deterministic (title, meta description, alt text).

Usage

# Without --ai-enhance: the script exits immediately with an explanation
npm run enhance:content -- --domain de-de
# ❌  This script rewrites Sanity content. Pass --ai-enhance to confirm.

# With the flag: processes all posts below --threshold (default 60) for the domain
# Reads  → scripts/seo-scores.json         (which posts need enhancement and which rules failed)
#        → Sanity blogPost documents        (current field values)
# Output → scripts/enhance-log-de-de.json  (what was changed, old value vs new value)
#        → Sanity blogPost documents        (updated fields + aiEnhanced: true flag)
npm run enhance:content -- --domain de-de --ai-enhance

# Dry run — shows what would change without writing anything to Sanity
# Output → scripts/enhance-preview-de-de.json  (proposed changes only)
npm run enhance:content -- --domain de-de --ai-enhance --dry-run

Sanity flag on enhanced posts

A field aiEnhanced (boolean, hidden from editors by default) is set to true on every post that Claude touches. This lets editors filter and review all AI-enhanced posts in Sanity Studio before sign-off:

// In blogPost.ts — add to schema:
{
  name: 'aiEnhanced',
  title: 'AI Enhanced — Needs Review',
  type: 'boolean',
  initialValue: false,
  description: 'Set automatically when Claude rewrites any field. Must be reviewed before publishing.',
}

Editors can query all enhanced posts in Sanity Studio:

*[_type == "blogPost" && aiEnhanced == true] | order(publishedAt desc)

Enhancement log

scripts/enhance-log-de-de.json — full audit trail of every change:

[
  {
    "postId": "blogPost-abc123",
    "domain": "de-de",
    "field": "title",
    "before": "How Dry Ice Cleaning Works In Industrial Applications And Why You Should Care",
    "after": "How Dry Ice Cleaning Works: Industrial Guide",
    "rule": "titleLength",
    "enhancedAt": "2026-04-21T14:30:00Z"
  }
]

9.3 — Workflow: Score → Review → Enhance → Re-score

# 1. Score all posts after migration
npm run score:seo -- --domain de-de

# 2. Review scripts/seo-scores-bad.json — decide which posts are worth enhancing

# 3. Run enhancement (gated — requires explicit flag)
npm run enhance:content -- --domain de-de --ai-enhance --dry-run   # preview first
npm run enhance:content -- --domain de-de --ai-enhance             # apply

# 4. Re-score to verify improvement
npm run score:seo -- --domain de-de

# 5. In Sanity Studio — review all aiEnhanced posts before publishing
# GROQ: *[_type == "blogPost" && aiEnhanced == true && domain == "de-de"]

New scripts summary

Script	Command	Gate flag	Reads	Writes
`score-seo.ts`	`npm run score:seo`	None — read-only	`hubspot-posts-enriched.json` + `hubspot-posts-content.json`	`seo-scores.json`, `seo-scores-bad.json`, `seo-scores.csv`
`enhance-content-claude.ts`	`npm run enhance:content`	`--ai-enhance` required	seo-scores.json + Sanity	`enhance-log-<domain>.json` + Sanity updates + `aiEnhanced` flag

All Scripts — Reference Table

Script	Command	Status	Reads	Writes
`fetch-hubspot-blogs.ts`	`npm run fetch:blogs`	✅ Exists	HubSpot API	`hubspot-blogs.json`
`fetch-hubspot-users.ts`	`npm run fetch:users`	✅ Exists	HubSpot API	`hubspot-users.json`
`fetch-hubspot-taxonomy.ts`	`npm run fetch:taxonomy`	✅ Exists	HubSpot API	`hubspot-tags.json`
`fetch-post-bodies.ts`	`npm run fetch:post-bodies`	✅ Exists	HubSpot API	`hubspot-posts-content.json`
`enrich-hubspot-posts.ts`	`npm run enrich:posts`	✅ Exists	posts-cache + content + blogs	`hubspot-posts-enriched.json`
`check-language-coverage-ai.ts`	`npm run check:language-coverage:ai`	✅ Exists	posts-enriched	`translation-groups-ai.json`, `language-orphans-ai.json`, `coverage-matrix.csv`
`extract-domain-assets.ts`	`npm run extract:assets`	✅ Exists	posts-content	`all-domains-assets.json`
`check-dom-url.ts`	`npm run check:dom-url`	✅ Exists	all-domains-assets / brandfolder-url-map	`check-valid-all.json`, `check-valid-all.csv`
`upload-to-brandfolder.ts`	`npm run upload:brandfolder`	✅ Exists	all-domains-assets + check-valid	`brandfolder-url-map.json` + Sanity migrationAssetLog
`analyze-posts-blocks-claude.ts`	`npm run analyze:post-blocks:claude`	✅ Exists	posts-enriched + posts-content	`hubspot-posts-blocks.json`, `post-blocks-cache.json`, `post-blocks-errors.json`
`convert-taxonomy-excel.ts`	`npm run convert:taxonomy:excel`	✅ Written	Excel files + config	`taxonomy-output/<type>-<domain>.json`, `taxonomy-output/merged/<type>.json`
`push-taxonomy-to-sanity.ts`	`npm run push:taxonomy`	⏳ To write	merged taxonomy JSONs + hubspot-tags/users	`sanity-id-map.json`, `post-tag-map.json` + Sanity docs
`push-posts-to-sanity.ts`	`npm run push:posts:sanity`	⏳ To write	all JSON files above	`sanity-migration-log-<domain>.json` + Sanity blogPost docs
`score-seo.ts`	`npm run score:seo`	✅ Written	`hubspot-posts-enriched.json` + `hubspot-posts-content.json`	`seo-scores.json`, `seo-scores-bad.json`, `seo-scores.csv`
`enhance-content-claude.ts`	`npm run enhance:content`	⏳ To write	seo-scores.json + Sanity	`enhance-log-<domain>.json` + Sanity updates

All scripts support --limit <n> for testing with a small sample. Never use --limit in a full domain run.

Blog Migration Plan: HubSpot → Brandfolder + Sanity

Current State

Open Questions to Decide Before Starting

Domain / Locale Reference

Main blogs

Full Sequence at a Glance

Phase 1 — Refresh & Verify Source Data

1.1 — Verify post count

1.2 — Re-fetch if stale

1.3 — Confirm translation grouping

1.4 — Inspect a single post

Phase 2 — Taxonomy: Excel → JSON → Sanity

Tag types (from Excel Tags sheet — de-de only, shared globally)

Authors (from Excel Authors - Person Schema sheet — de-de only)

Step 2.0 — Install dependencies

Step 2.1 — Place Excel files

Step 2.2 — Create the config file

Discovery mode — verify sheet names without writing anything

Step 2.3 — Run the converter

Output structure

Step 2.4 — Sanity schema changes required

Step 2.5 — Seed taxonomy into Sanity

Step 2.6 — Build the post → tag mapping

Phase 3 — Asset Extraction & Deduplication

3.1 — Review duplicates manually

3.2 — Validate URLs before uploading

Phase 4 — Upload Assets to Brandfolder

Phase 5 — Content Block Analysis (Claude AI)

Block types produced

Phase 6 — Sanity Blog Post Migration (Per Domain)

Action: Write scripts/push-posts-to-sanity.ts (new script)

Migration order

Per-domain checklist

Phase 7 — URL Validation Pass

Phase 8 — QA & Finalisation

Manual checks per domain batch

Phase 9 — SEO Scoring & AI Content Enhancement (Post-Migration)

9.1 — SEO Scoring Script

Scoring rules (11 rules, max 90 pts)

Usage

Output structure

9.2 — AI Content Enhancement Script

What Claude enhances (per failing rule in seo-scores.json)

Usage

Sanity flag on enhanced posts

Enhancement log

9.3 — Workflow: Score → Review → Enhance → Re-score

New scripts summary

All Scripts — Reference Table

Tag types (from Excel `Tags` sheet — `de-de` only, shared globally)

Authors (from Excel `Authors - Person Schema` sheet — `de-de` only)

Action: Write `scripts/push-posts-to-sanity.ts` (new script)