Blog Migration Plan: HubSpot → Brandfolder + Sanity

Last updated: 2026-04-21


Current State

File Status
scripts/hubspot-posts-cache.json ✅ Done — ~1,200 posts cached
scripts/hubspot-posts-content.json ✅ Done — full HTML bodies
scripts/hubspot-posts-enriched.json ✅ Done — enriched metadata
scripts/translation-groups-ai.json ✅ Done — posts grouped by language
scripts/hubspot-users.json ✅ Done — authors/users
scripts/hubspot-tags.json ✅ Done — tags
scripts/hubspot-blogs.json ✅ Done — blog configs
scripts/all-domains-assets.json ✅ Done — asset inventory per domain
scripts/hubspot-posts-blocks.json 🔄 Partial — Claude block analysis (resumable)
scripts/taxonomy-output/ 🔄 Partial — de-de run complete; remaining domains pending
scripts/seo-scores.json 🔄 Partial — en-us scored; re-run after each domain migrates
scripts/seo-scores-bad.json 🔄 Partial — failing posts below threshold (en-us)
scripts/seo-scores.csv 🔄 Partial — spreadsheet export (en-us)

Testing flag: all scripts accept --limit <n> to process only n items. Use this for spot-checks only — never for a full domain run.


Open Questions to Decide Before Starting

  1. Slug strategy: Should the Sanity slug match the HubSpot URL path exactly (for SEO redirect mapping), or use the translated title?
  2. legacyHtml field: Needs to be added to blogPost.ts schema — confirm before Phase 6.
  3. Block types: Are the 7 block types (bodyText, bodyImage, bodyVideo, bodyBlockquote, comparisonTable, faqItem, howToModule) the full list, or are there HubSpot modules (sliders, CTAs, forms) that need new schema types before analysis runs?

Domain / Locale Reference

All per-domain runs use the full locale code from HubSpot (language field in hubspot-blogs.json).

Main blogs

Country Domain code HubSpot blog name HubSpot domain
USA / English en-us The Cold Jet Blog blog.coldjet.com
Germany de-de Cold Jet Germany Blog blog-de.coldjet.com
France fr-fr Cold Jet France Blog blog-fr.coldjet.com
Netherlands nl-nl Cold Jet Netherlands Blog blog-nl.coldjet.com
Belgium (French) fr-be Cold Jet Belgium (French) Blog blog-fr-be.coldjet.com
Mexico es-mx Cold Jet Mexico Blog blog-mx.coldjet.com
China zh-cn Cold Jet China Blog blog-cn.coldjet.com
Poland pl-pl Cold Jet Poland Blog blog-pl.coldjet.com
Japan ja-jp Cold Jet Japan Blog blog-ja.coldjet.com
Brazil pt-br Cold Jet Brazil Blog blog-pt-br.coldjet.com
Belgium (Dutch) nl-be Cold Jet Belgium (Dutch) Blog blog-nl-be.coldjet.com
Spain es-es Cold Jet Spain Blog blog-es.coldjet.com
Portugal pt-pt Cold Jet Portugal Blog (no domain set)

Full Sequence at a Glance

Phase 1 — Data (mostly already done)
  └── npm run enrich:posts                              # refresh if needed

Phase 2 — Excel taxonomy → JSON → Sanity
  └── Copy excels to scripts/excels/
  └── Copy & fill scripts/excel-convert-config.json from example
  └── npm install                                       # installs xlsx package
  └── npm run convert:taxonomy:excel -- --domain de-de # one domain
  └── npm run convert:taxonomy:excel                   # all domains
  └── Review scripts/taxonomy-output/merged/
  └── npm run push:taxonomy -- --dataset staging

Phase 3 — Asset dedup (manual)
  └── Edit scripts/all-domains-assets.json manually
  └── npm run check:dom-url

Phase 4 — Upload assets → Brandfolder
  └── npm run upload:brandfolder

Phase 5 — Block analysis (Claude AI)
  └── npm run analyze:post-blocks:claude -- --domain de-de   # one domain
  └── npm run analyze:post-blocks:claude                     # all domains

Phase 6 — Blog posts → Sanity (per domain)
  └── npm run push:posts:sanity -- --domain de-de --dataset staging
  └── [QA] spot-check in Sanity Studio
  └── npm run push:posts:sanity -- --domain de-de --dataset production
  └── Repeat: fr-fr → en-us → es-es → pl-pl → ...

Phase 7 — URL validation
  └── npm run check:dom-url

Phase 8 — QA
  └── npx tsc --noEmit && npm run lint
  └── Manual review in Sanity Studio

Phase 1 — Refresh & Verify Source Data

All data is already cached. Verify it is complete and up to date before any writes.

1.1 — Verify post count

# Prints the total number of cached posts to confirm the cache is not empty or truncated
# No output files — console only
node -e "const d=require('./scripts/hubspot-posts-cache.json'); console.log(d.length, 'posts')"

1.2 — Re-fetch if stale

# Fetches all blog configurations (names, IDs, domains) from HubSpot API
# Output → scripts/hubspot-blogs.json
npm run fetch:blogs

# Fetches all HubSpot portal users (name, email, userId)
# Output → scripts/hubspot-users.json
npm run fetch:users

# Fetches all HubSpot blog tags / taxonomy
# Output → scripts/hubspot-tags.json
npm run fetch:taxonomy

# Fetches the full HTML body of every blog post (paginated, concurrent)
# Output → scripts/hubspot-posts-content.json
npm run fetch:post-bodies

# Merges post metadata + bodies + blog info into one enriched dataset
# Reads  → hubspot-posts-cache.json + hubspot-posts-content.json + hubspot-blogs.json
# Output → scripts/hubspot-posts-enriched.json
npm run enrich:posts

1.3 — Confirm translation grouping

# Groups posts that are translations of each other using AI matching
# Reads  → scripts/hubspot-posts-enriched.json
# Output → scripts/translation-groups-ai.json  (array of groups, each group = same article in N languages)
#        → scripts/language-orphans-ai.json     (posts with no matched translation)
#        → scripts/coverage-matrix.csv          (language × domain coverage table)
npm run check:language-coverage:ai

1.4 — Inspect a single post

# Prints all fields of one post to the console — useful for debugging field mappings
# No output files — console only
npm run inspect:post -- --id <hubspot_post_id>

Phase 2 — Taxonomy: Excel → JSON → Sanity

There are 4 tag types + authors that come from Excel files (one per domain), not from HubSpot. They are richer and more categorised than anything in HubSpot.

Tag types (from Excel Tags sheet — de-de only, shared globally)

Tag type Sanity doc type Field on blogPost
Technology Tags technologyTag technologyTags[] → array of references
Product Model Tags productModelTag productModelTags[] → array of references
Industry Tags industryTag industryTags[] → array of references
Application Tags applicationTag applicationTags[] → array of references

Authors (from Excel Authors - Person Schema sheet — de-de only)

Extracted fields per author:

Excel column Output key Notes
Keep as Author? keepAsAuthor Replace or Yes
Full Name (for Sanity) name Used as the Sanity person display name + slug source
Job Title jobTitle
Bio (50-100 words) bio
Headshot Available? headshotAvailable
LinkedIn URL linkedIn
Knows About (3-5 topics) knowsAbout Split by comma into array
Credentials credentials
Education education

Template/instruction rows are automatically filtered out using skipIfColumnEmpty: "#" and excludeNames: ["Map to Created By"] in the config.

Plus the existing HubSpot sources:

Type Sanity doc type Source
Blog categories category hubspot-blogs.json + hubspot-tags.json

Step 2.0 — Install dependencies

# Installs all npm packages including xlsx (SheetJS) which is needed to read .xlsx files
# No output files
npm install

Step 2.1 — Place Excel files

One Excel file per domain. The config (excel-convert-config.json) already covers the first 5 — add an entry for each additional domain you have an Excel file for:

scripts/excels/
  blogs-de-de.xlsx       ← Tags sheet + Authors sheet + Blog Posts sheet
  blogs-fr-fr.xlsx       ← Blog Posts sheet only
  blogs-en-us.xlsx
  blogs-es-es.xlsx
  blogs-pl-pl.xlsx
  blogs-es-mx.xlsx       ← add to config if available
  blogs-fr-be.xlsx
  blogs-nl-nl.xlsx
  blogs-nl-be.xlsx
  blogs-pt-br.xlsx
  blogs-ja-jp.xlsx
  blogs-pt-pt.xlsx

Step 2.2 — Create the config file

# Copies the example config — sheet names and column headers are already filled in
# Output → scripts/excel-convert-config.json  (ready to use, just add your domain file paths)
cp scripts/excel-convert-config.example.json scripts/excel-convert-config.json

The config is pre-filled with the real sheet and column structure:

Sheet: Tags — read from de-de only (identical across all files)

Excel column Output key Tag type
industries-tags name industryTags
Technology Tags name technologyTags
Product Model Tags name productModelTags
Application Tags name applicationTags

Each column is extracted as a separate outputType — 4 passes over the same sheet, one per tag type.

Sheet: Authors - Person Schema — read from de-de only, outputs authors in all-tags.json

Excel column Output key
Keep as Author? keepAsAuthor
Full Name (for Sanity) name (also generates slug)
Job Title jobTitle
Bio (50-100 words) bio
Headshot Available? headshotAvailable
LinkedIn URL linkedIn
Knows About (3-5 topics) knowsAbout (array)
Credentials credentials
Education education

Sheet: Blog Posts — read from every domain file

Excel column Output key
URL url (post identifier)
Technology Tags technologyTags
Product Model Tags productModelTags
Industry Tags industryTags
Application Tags applicationTags

Tag columns contain comma-separated tag names. push-taxonomy-to-sanity.ts splits them and resolves each name to a Sanity document ID.

Only the de-de entry includes the Tags sheet. All other domains only include Blog posts — the tag lists are shared globally and deduplication in merged/ handles any overlap.

Discovery mode — verify sheet names without writing anything

# Opens the Excel file and prints every sheet name + its column headers, then exits
# No output files — console only
npm run convert:taxonomy:excel -- --file scripts/excels/blogs-de-de.xlsx

Step 2.3 — Run the converter

# Converts a single domain's Excel file to JSON
# Reads  → scripts/excels/blogs-de-de.xlsx  (via config)
# Output → scripts/taxonomy-output/technologyTags-de-de.json
#        → scripts/taxonomy-output/productModelTags-de-de.json
#        → scripts/taxonomy-output/industryTags-de-de.json
#        → scripts/taxonomy-output/applicationTags-de-de.json
#        → scripts/taxonomy-output/posts-de-de.json  (post → tag mapping rows)
npm run convert:taxonomy:excel -- --domain de-de

# Converts all domains defined in the config, then deduplicates across all languages
# Reads  → all Excel files listed in scripts/excel-convert-config.json
# Output → scripts/taxonomy-output/<type>-<domain>.json  (one per sheet per domain)
#        → scripts/taxonomy-output/merged/technologyTags.json   (deduplicated — seed into Sanity)
#        → scripts/taxonomy-output/merged/productModelTags.json
#        → scripts/taxonomy-output/merged/industryTags.json
#        → scripts/taxonomy-output/merged/applicationTags.json
npm run convert:taxonomy:excel

# Same as above but also writes .xlsx files alongside the JSON (optional review format)
# Output → same as above + scripts/taxonomy-output/<type>-<domain>.xlsx
npm run convert:taxonomy:excel -- --format excel

Output structure

scripts/taxonomy-output/
  all-tags.json           ← single object with all 5 types (seed into Sanity)
  posts-de-de.json        ← one row per post: url + 4 tag columns (arrays of tag names)
  posts-fr-fr.json
  posts-en-us.json
  ...

all-tags.json structure:

{
  "industryTags":     [ { "name": "Automotive", "slug": "automotive" }, ... ],
  "technologyTags":   [ { "name": "CO2 Cleaning", "slug": "co2-cleaning" }, ... ],
  "productModelTags": [ { "name": "i3 MicroClean", "slug": "i3-microclean" }, ... ],
  "applicationTags":  [ { "name": "Surface Cleaning", "slug": "surface-cleaning" }, ... ],
  "authors": [
    {
      "keepAsAuthor": "Replace",
      "name": "Matt Caminiti",
      "jobTitle": "Director, Corporate Marketing Communications & Strategy",
      "bio": "",
      "headshotAvailable": "",
      "linkedIn": "https://www.linkedin.com/in/matt-caminiti/",
      "knowsAbout": [],
      "credentials": "",
      "education": "",
      "slug": "matt-caminiti",
      "language": "de-de"
    }
  ]
}

Each post row:

{
  "domain": "de-de",
  "url": "https://www.coldjet.com/de/blog/article-slug/",
  "technologyTags": ["CO2 Cleaning", "Dry Ice"],
  "productModelTags": ["i3 MicroClean"],
  "industryTags": ["Automotive", "Aerospace"],
  "applicationTags": ["Surface Cleaning"]
}

Step 2.4 — Sanity schema changes required

New tag schema files (shape: name, slug, description):

src/sanity/schemaTypes/technologyTag.ts
src/sanity/schemaTypes/productModelTag.ts
src/sanity/schemaTypes/industryTag.ts
src/sanity/schemaTypes/applicationTag.ts

person.ts — ensure it has fields matching the Excel author columns:

name, slug, jobTitle, bio, headshotAvailable, linkedIn,
knowsAbout (array of string), credentials, education, language

blogPost.ts — add a taxonomy group with 4 new reference array fields:

technologyTags   → array of reference → technologyTag
productModelTags → array of reference → productModelTag
industryTags     → array of reference → industryTag
applicationTags  → array of reference → applicationTag

blogPost.ts — add legacyHtml field for rollback safety:

{ name: 'legacyHtml', type: 'text', readOnly: true, hidden: true }

index.ts — register all 4 new types.


Step 2.5 — Seed taxonomy into Sanity

# Creates category, person, and all 4 tag type documents in the Sanity staging dataset
# Reads  → scripts/taxonomy-output/merged/technologyTags.json
#        → scripts/taxonomy-output/merged/productModelTags.json
#        → scripts/taxonomy-output/merged/industryTags.json
#        → scripts/taxonomy-output/merged/applicationTags.json
#        → scripts/hubspot-tags.json   (→ category documents)
#        → scripts/hubspot-users.json  (→ person documents)
# Output → scripts/sanity-id-map.json  (maps every hubspot/slug id → sanity document _id)
npm run push:taxonomy -- --dataset staging

# After reviewing staging in Sanity Studio — promote to production
# Reads  → same files as above
# Output → updates scripts/sanity-id-map.json with production _ids
npm run push:taxonomy -- --dataset production

scripts/sanity-id-map.json structure:

{
  "categories":       { "<hubspot_tag_id>":  "<sanity_doc_id>" },
  "persons":          { "<hubspot_user_id>": "<sanity_doc_id>" },
  "technologyTags":   { "<slug>": "<sanity_doc_id>" },
  "productModelTags": { "<slug>": "<sanity_doc_id>" },
  "industryTags":     { "<slug>": "<sanity_doc_id>" },
  "applicationTags":  { "<slug>": "<sanity_doc_id>" }
}

Step 2.6 — Build the post → tag mapping

The posts-<domain>.json files contain which tag names belong to which post. The taxonomy push script consolidates these (using the sanity-id-map.json to resolve names → Sanity IDs) into:

# Produced automatically at the end of npm run push:taxonomy
# Reads  → scripts/taxonomy-output/posts-*.json  (all domains)
#        → scripts/sanity-id-map.json
# Output → scripts/post-tag-map.json

scripts/post-tag-map.json structure:

{
  "<hubspot_post_id>": {
    "technologyTags":   ["<sanity_id>", ...],
    "productModelTags": ["<sanity_id>", ...],
    "industryTags":     ["<sanity_id>", ...],
    "applicationTags":  ["<sanity_id>", ...]
  }
}

Phase 3 — Asset Extraction & Deduplication

scripts/all-domains-assets.json already exists — every image, video, and document URL found in blog content, grouped by domain.

3.1 — Review duplicates manually

# Prints the list of domain keys inside the asset file — shows which domains have assets
# No output files — console only
node -e "const d=require('./scripts/all-domains-assets.json'); console.log(Object.keys(d))"

Then open scripts/all-domains-assets.json and for each duplicate entry add:

Brandfolder detects binary duplicates on upload automatically. Keep the admin view open at Settings → General Settings → Advanced → Manage Deleted Assets to catch and resolve those.

3.2 — Validate URLs before uploading

# Sends a HEAD request to every asset URL and records the HTTP status
# Reads  → scripts/all-domains-assets.json
# Output → scripts/check-valid-all.json   (url + status + response time per asset)
#        → scripts/check-valid-all.csv    (same, spreadsheet-friendly)
npm run check:dom-url

Only upload assets with status 200. Fix or skip anything else before Phase 4.


Phase 4 — Upload Assets to Brandfolder

# Uploads every non-skipped, validated asset to Brandfolder
# Reads  → scripts/all-domains-assets.json   (asset list with dedup markers)
#        → scripts/check-valid-all.json       (skip anything not 200)
# Output → scripts/brandfolder-url-map.json  (old hubspot url → new brandfolder CDN url)
#        → Sanity migrationAssetLog documents (one per uploaded asset: sourceUrl, brandfolderId, CDN url, uploadedAt)
npm run upload:brandfolder

scripts/brandfolder-url-map.json is the critical handoff file to Phase 6. Every HubSpot CDN link in post content will be replaced using this map.


Phase 5 — Content Block Analysis (Claude AI)

Transform raw HubSpot HTML into typed Sanity bodySections blocks using the Anthropic Batches API.

# Submits all posts for the given domain to Claude via the Batches API, then polls for results
# Reads  → scripts/hubspot-posts-enriched.json   (post metadata + domain info)
#        → scripts/hubspot-posts-content.json     (raw HTML bodies)
#        → scripts/post-blocks-cache.json         (resume state — skip already-processed posts)
# Output → scripts/hubspot-posts-blocks.json      (structured bodySections array per post ID)
#        → scripts/post-blocks-cache.json         (updated cache — safe to resume from here)
#        → scripts/post-blocks-batches.json       (Anthropic batch IDs and status)
#        → scripts/post-blocks-errors.json        (posts Claude could not parse — fix manually)
npm run analyze:post-blocks:claude -- --domain de-de

# Same but processes all domains in sequence
# Output → same files as above, populated for all posts across all domains
npm run analyze:post-blocks:claude

The script is fully resumable — if it is stopped mid-run, re-running picks up from the cache.

Block types produced

Block type Source trigger
bodyText Every H2/H3 section and surrounding paragraphs
bodyImage <img> tags extracted from content flow
bodyVideo YouTube / Vimeo / Wistia iframes
bodyBlockquote <blockquote> elements
comparisonTable <table> elements
faqItem FAQ patterns detected by Claude
howToModule Numbered step guides

Phase 6 — Sanity Blog Post Migration (Per Domain)

Action: Write scripts/push-posts-to-sanity.ts (new script)

Accepts --domain and --dataset flags. For each post in the target domain:

  1. Read translation-groups-ai.json — find all posts for that domain
  2. Map HubSpot fields → Sanity blogPost fields
  3. Resolve author / category references from sanity-id-map.json
  4. Resolve the 4 tag reference arrays from post-tag-map.json
  5. Swap asset URLs using brandfolder-url-map.json
  6. Write bodySections from hubspot-posts-blocks.json
  7. Write raw HTML to legacyHtml field
  8. Create the document with _type: 'blogPost'
  9. Output scripts/sanity-migration-log-<domain>.json for audit
# Creates all blog post documents for de-de in the staging dataset
# Reads  → scripts/translation-groups-ai.json   (which posts belong to de-de)
#        → scripts/hubspot-posts-enriched.json   (fields: title, slug, publishedAt, author...)
#        → scripts/hubspot-posts-blocks.json     (bodySections blocks from Phase 5)
#        → scripts/hubspot-posts-content.json    (raw HTML for legacyHtml field)
#        → scripts/sanity-id-map.json            (resolve author, category, tag refs)
#        → scripts/post-tag-map.json             (resolve 4 tag arrays)
#        → scripts/brandfolder-url-map.json      (swap image URLs)
# Output → Sanity blogPost documents in the staging dataset
#        → scripts/sanity-migration-log-de-de.json  (log of every created/skipped/failed doc)
npm run push:posts:sanity -- --domain de-de --dataset staging

# After QA passes in Sanity Studio — promote to production
# Reads  → same files as above
# Output → Sanity blogPost documents in the production dataset
#        → scripts/sanity-migration-log-de-de.json  (updated with production doc _ids)
npm run push:posts:sanity -- --domain de-de --dataset production

Migration order

de-de → fr-fr → en-us → es-es → pl-pl → es-mx → fr-be → nl-nl → nl-be → pt-br → ja-jp → zh-cn → pt-pt

Start with de-de — one of the largest domains — so issues surface early. Academy blogs (en, pl, de, fr, ja) should be confirmed in scope separately before migrating.

Per-domain checklist


Phase 7 — URL Validation Pass

# Re-validates all URLs that appear in migrated content — catches any Brandfolder CDN issues
# Reads  → scripts/brandfolder-url-map.json  (all new CDN URLs to check)
# Output → scripts/check-valid-all.json       (updated — any non-200 URLs flagged for manual fix)
#        → scripts/check-valid-all.csv
npm run check:dom-url

Use scripts/brandfolder-url-map.json to find any remaining HubSpot CDN URLs not yet replaced and fix them before the next domain runs.


Phase 8 — QA & Finalisation

# Checks TypeScript types across the entire project — must pass with zero errors
# No output files — exits non-zero if there are type errors
npx tsc --noEmit

# Runs ESLint across the project — must pass with zero errors before committing
# No output files — exits non-zero on lint errors
npm run lint

Manual checks per domain batch

  1. Browse 10 random posts in Sanity Studio — are bodySections blocks correct?
  2. Are all images displaying via Brandfolder CDN?
  3. Are all 4 tag types populated correctly on each post?
  4. Are author bylines and categories populated with references (not raw strings)?
  5. Does the slug match the original HubSpot URL path (for redirect mapping)?
  6. If a post had a video — does the bodyVideo block have the correct embed URL?
  7. Check scripts/post-blocks-errors.json — manually fix any posts Claude could not parse

Phase 9 — SEO Scoring & AI Content Enhancement (Post-Migration)

These two scripts run after migration is complete and posts are live in Sanity. They are optional but strongly recommended before launching each country.


9.1 — SEO Scoring Script

Script: scripts/score-seo.ts ✅ Written Command: npm run score:seo

Scores every post against 11 SEO best-practice rules and writes the results to JSON/CSV. No content is changed — read-only analysis. Max score is 90 pts.

Scoring rules (11 rules, max 90 pts)

Rule Check Points
Title length Between 50–60 characters 10
Title has primary keyword Keyword (derived from slug) appears in title 10
Meta description length Between 120–160 characters 10
H1 present At least one <h1> in body HTML 5
H2/H3 structure At least 2 section headings in body 10
Word count Body content ≥ 300 words 10
Image alt text All <img> tags in body have non-empty alt text 10
Featured image featuredImage field is populated 5
Keyword density Primary keyword appears 1–3% of body word count 10
Slug quality Lowercase, hyphens only, ≤ 75 chars 5
Internal links At least 1 internal link (coldjet.com or relative) in body 5

Usage

# Score all posts across all domains
# Reads  → scripts/hubspot-posts-enriched.json
#        → scripts/hubspot-posts-content.json
# Output → scripts/seo-scores.json   (one entry per post with score + per-rule breakdown)
#        → scripts/seo-scores.csv    (same, spreadsheet-friendly)
npm run score:seo

# Score one domain only
npm run score:seo -- --domain de-de

# Show only posts below threshold + print worst 10 to console (default threshold: 60)
# Output → scripts/seo-scores-bad.json  (failing posts sorted worst-first)
npm run score:seo -- --domain de-de --threshold 60 --bad-only

Output structure

scripts/seo-scores.json — array of post results:

[
  {
    "postId": "7359675231",
    "domain": "en-us",
    "title": "3 ways dry ice blasting is used in the automotive industry",
    "slug": "3-ways-dry-ice-blasting-is-used-in-the-automotive-industry",
    "score": 65,
    "passed": 8,
    "failed": 3,
    "rules": {
      "titleLength":    { "pass": false, "note": "57 chars — ideal (50–60)" },
      "wordCount":      { "pass": false, "note": "218 words — below 300 minimum" },
      "imageAltText":   { "pass": false, "note": "2 of 4 images missing alt text" },
      "internalLinks":  { "pass": true,  "note": "3 internal links found" }
    }
  }
]

scripts/seo-scores-bad.json — only failing posts (score < threshold), sorted worst-first. Same shape as above.


9.2 — AI Content Enhancement Script

Script to write: scripts/enhance-content-claude.ts Command to add: npm run enhance:content

Uses Claude to improve post content based on the SEO score results. This script is gated behind an explicit flag — it will refuse to run unless --ai-enhance is passed. This prevents accidental bulk rewrites.

Enhanced posts are flagged in both the JSON output and in Sanity so editors know to review them before publishing.

What Claude enhances (per failing rule in seo-scores.json)

Failing rule Enhancement
titleLength Rewrites title to fit 50–60 chars while keeping meaning
metaLength Rewrites meta description to hit 120–160 chars
imageAltText Generates descriptive alt text from image URL + post context
keywordDensity Adjusts keyword placement in bodyText blocks
wordCount Flags post as too short — suggests expansion topics (does not auto-expand)

Claude never auto-expands short posts — those require human judgement. It only rewrites fields where the fix is deterministic (title, meta description, alt text).

Usage

# Without --ai-enhance: the script exits immediately with an explanation
npm run enhance:content -- --domain de-de
# ❌  This script rewrites Sanity content. Pass --ai-enhance to confirm.

# With the flag: processes all posts below --threshold (default 60) for the domain
# Reads  → scripts/seo-scores.json         (which posts need enhancement and which rules failed)
#        → Sanity blogPost documents        (current field values)
# Output → scripts/enhance-log-de-de.json  (what was changed, old value vs new value)
#        → Sanity blogPost documents        (updated fields + aiEnhanced: true flag)
npm run enhance:content -- --domain de-de --ai-enhance

# Dry run — shows what would change without writing anything to Sanity
# Output → scripts/enhance-preview-de-de.json  (proposed changes only)
npm run enhance:content -- --domain de-de --ai-enhance --dry-run

Sanity flag on enhanced posts

A field aiEnhanced (boolean, hidden from editors by default) is set to true on every post that Claude touches. This lets editors filter and review all AI-enhanced posts in Sanity Studio before sign-off:

// In blogPost.ts — add to schema:
{
  name: 'aiEnhanced',
  title: 'AI Enhanced — Needs Review',
  type: 'boolean',
  initialValue: false,
  description: 'Set automatically when Claude rewrites any field. Must be reviewed before publishing.',
}

Editors can query all enhanced posts in Sanity Studio:

*[_type == "blogPost" && aiEnhanced == true] | order(publishedAt desc)

Enhancement log

scripts/enhance-log-de-de.json — full audit trail of every change:

[
  {
    "postId": "blogPost-abc123",
    "domain": "de-de",
    "field": "title",
    "before": "How Dry Ice Cleaning Works In Industrial Applications And Why You Should Care",
    "after": "How Dry Ice Cleaning Works: Industrial Guide",
    "rule": "titleLength",
    "enhancedAt": "2026-04-21T14:30:00Z"
  }
]

9.3 — Workflow: Score → Review → Enhance → Re-score

# 1. Score all posts after migration
npm run score:seo -- --domain de-de

# 2. Review scripts/seo-scores-bad.json — decide which posts are worth enhancing

# 3. Run enhancement (gated — requires explicit flag)
npm run enhance:content -- --domain de-de --ai-enhance --dry-run   # preview first
npm run enhance:content -- --domain de-de --ai-enhance             # apply

# 4. Re-score to verify improvement
npm run score:seo -- --domain de-de

# 5. In Sanity Studio — review all aiEnhanced posts before publishing
# GROQ: *[_type == "blogPost" && aiEnhanced == true && domain == "de-de"]

New scripts summary

Script Command Gate flag Reads Writes
score-seo.ts npm run score:seo None — read-only hubspot-posts-enriched.json + hubspot-posts-content.json seo-scores.json, seo-scores-bad.json, seo-scores.csv
enhance-content-claude.ts npm run enhance:content --ai-enhance required seo-scores.json + Sanity enhance-log-<domain>.json + Sanity updates + aiEnhanced flag

All Scripts — Reference Table

Script Command Status Reads Writes
fetch-hubspot-blogs.ts npm run fetch:blogs ✅ Exists HubSpot API hubspot-blogs.json
fetch-hubspot-users.ts npm run fetch:users ✅ Exists HubSpot API hubspot-users.json
fetch-hubspot-taxonomy.ts npm run fetch:taxonomy ✅ Exists HubSpot API hubspot-tags.json
fetch-post-bodies.ts npm run fetch:post-bodies ✅ Exists HubSpot API hubspot-posts-content.json
enrich-hubspot-posts.ts npm run enrich:posts ✅ Exists posts-cache + content + blogs hubspot-posts-enriched.json
check-language-coverage-ai.ts npm run check:language-coverage:ai ✅ Exists posts-enriched translation-groups-ai.json, language-orphans-ai.json, coverage-matrix.csv
extract-domain-assets.ts npm run extract:assets ✅ Exists posts-content all-domains-assets.json
check-dom-url.ts npm run check:dom-url ✅ Exists all-domains-assets / brandfolder-url-map check-valid-all.json, check-valid-all.csv
upload-to-brandfolder.ts npm run upload:brandfolder ✅ Exists all-domains-assets + check-valid brandfolder-url-map.json + Sanity migrationAssetLog
analyze-posts-blocks-claude.ts npm run analyze:post-blocks:claude ✅ Exists posts-enriched + posts-content hubspot-posts-blocks.json, post-blocks-cache.json, post-blocks-errors.json
convert-taxonomy-excel.ts npm run convert:taxonomy:excel ✅ Written Excel files + config taxonomy-output/<type>-<domain>.json, taxonomy-output/merged/<type>.json
push-taxonomy-to-sanity.ts npm run push:taxonomy ⏳ To write merged taxonomy JSONs + hubspot-tags/users sanity-id-map.json, post-tag-map.json + Sanity docs
push-posts-to-sanity.ts npm run push:posts:sanity ⏳ To write all JSON files above sanity-migration-log-<domain>.json + Sanity blogPost docs
score-seo.ts npm run score:seo ✅ Written hubspot-posts-enriched.json + hubspot-posts-content.json seo-scores.json, seo-scores-bad.json, seo-scores.csv
enhance-content-claude.ts npm run enhance:content ⏳ To write seo-scores.json + Sanity enhance-log-<domain>.json + Sanity updates

All scripts support --limit <n> for testing with a small sample. Never use --limit in a full domain run.