Integration Layer (Silver) Modeling Standards

Status: Draft v1 — proposed standards, pending team + management review. Owner: Enterprise Data Analytics engineering team. Scope: All dbt models under models/integration_layer/ in this repo. Audience: QTS data engineers, AEs, external partners (e.g. PHdata), and AI coding assistants.

0. Relationship to .github/dbt_style_guide.md

A unified QTS dbt style guide already exists at .github/dbt_style_guide.md (last updated 2025-07-31). It is the authoritative source for shared SQL, Jinja, Python, and YAML conventions across the entire repo (IL, PL, snapshots, macros) and is what .github/copilot-instructions.md points GitHub Copilot at.

This document does not duplicate that guide. It defers to it for:

This document adds the IL-specific layer on top:

If this document and .github/dbt_style_guide.md ever conflict, the global style guide wins on shared concerns (formatting, base naming) and this document wins on IL-specific concerns (cleansing, dedup, audit, soft delete).

1. Purpose

The Integration Layer (IL) is the standardized, cleansed contract between raw source data and downstream business modeling. It is what other teams, analysts, semantic models, and AI tools should depend on instead of raw tables.

The IL exists to:

The IL is not the place for:

2. Scope and layer terminology

Layer Repo path Database pattern Purpose
Raw (Bronze) Snowflake RAW_* databases RAW_*_DB Untransformed source data, owned by ingestion.
Integration (Silver) models/integration_layer/ EDW_<TARGET>_IL_DB Standardized, cleansed, source-faithful.
Presentation (Gold) models/presentation_layer/ EDW_<TARGET>_PL_DB Business-modeled facts, dims, and bridges.

This document governs the Integration Layer only.

3. Folder structure

The current convention in this repo, formalized:

models/integration_layer/
├── <source_system>/                     ← one folder per source system
│   ├── <source_system>.yaml             ← model + column docs and tests
│   ├── cleansing_transformations/       ← cleansed, source-faithful models
│   │   └── <source_abbrev>_<entity>.sql
│   └── business_transformations/        ← cross-entity reshaping within one source
│       └── <source_abbrev>_<entity>.sql
└── dimensions/                          ← cross-source IL-level dim helpers (rare)

Rules:

  1. Every source system gets its own top-level folder under integration_layer/.
  2. The first level inside a source folder reflects subject-area or domain grouping when the source is large (e.g. workday_hcm/organization/). Otherwise it is cleansing_transformations/ and business_transformations/ directly.
  3. cleansing_transformations/ is for models that are 1:1 with a single source table and apply renaming, casting, NULL handling, and dedup only.
  4. business_transformations/ is for models that combine multiple cleansing_transformations outputs from the same source into a more useful shape. Cross-source joining stays in the Presentation Layer.

4. Model naming

IL model names

Format: <source_abbrev>_<entity> (singular, snake_case).

Source Abbreviation Example
Salesforce sf sf_quote
Workday HCM wd_hcm wd_hcm_worker
Workday Financials wd_fin wd_fin_invoice_line
Workday Adaptive Planning wd_ap wd_ap_account
Procore pc pc_daily_log
Smartsheet ss ss_project
Datahawk dh dh_market
8x8 eb eb_call

Rules:

  1. Models are singular (sf_quote, not sf_quotes). One row = one entity.
  2. Lowercase snake_case, ASCII only.
  3. No abbreviations beyond the established source abbreviation. Prefer wd_hcm_worker_position over wd_hcm_wkr_pos.
  4. Deprecated suffixes: legacy models use a _bl suffix (e.g. wd_hcm_organization_bl). New IL models must not add this suffix. The pattern is being retired. When touching legacy _bl models, rename if it can be done as part of the same PR; otherwise leave for a focused rename PR.
  5. Fivetran salesforce_formula_utils exception. Models generated by the fivetran/salesforce_formula_utils package end in _formula (e.g. sf_quote_formula). The suffix is package-defined, not arbitrary, and must not be edited by hand — regeneration will overwrite changes. These models are exempt from rules 1–4 above for the suffix only.

IL fact / dim naming

IL business_transformation models occasionally take fact/dim shape inside a single source (e.g. sf_quote_dim, sf_opportunity_fact). In active use today is the suffix style: <source_abbrev>_<entity>_<dim|fact>.

Presentation Layer naming (informational only)

This document does not govern Presentation Layer naming. PL conventions vary across the repo (dim_fin_company vs wd_ap_dim_account_pl) and are out of scope for the IL standards. See .ai/findings/repo-survey-2026-06-17.md §3 for the open decision.

File names match model names

models/integration_layer/workday_hcm/organization/cleansing_transformations/wd_hcm_worker.sql materializes as wd_hcm_worker.

5. Column naming

  1. Lowercase snake_case. No camelCase, no PascalCase, no spaces, no hyphens.
  2. Primary keys are descriptive, never id in the projected column list.
    • worker_id, daily_log_id, quote_id
    • id, pk, key
    • When the source column is id, rename in the cleansing model: id as worker_id.
    • Exceptions:
      • The incremental unique_key config may continue to reference id while the underlying column is being renamed end-to-end (transition window).
      • Models generated by salesforce_formula_utils keep unique_key="id" to stay compatible with the package.
      • Snapshots in snapshots/ may use unique_key: id because they read from source(...) directly, before any IL rename has happened.
  3. Foreign keys carry the parent entity name + _id. A foreign key from wd_hcm_worker_position to wd_hcm_worker is worker_id in both tables.
  4. Surrogate keys end in _key (Presentation Layer concept; rare in IL).
  5. Do not abbreviate. customer_id, not cust_id. order_id, not o_id.
  6. Do not use reserved words. Avoid select, from, group, order, class, type (use <entity>_type instead).
  7. Booleans start with is_ or has_ and read as a statement.
    • is_active, has_dependents, is_terminated, is_floorplan
    • active, terminated, floorplan
    • Rename at IL: active as is_active.
  8. Timestamps end in _at and are stored in UTC.
    • created_at, updated_at, terminated_at
    • If a non-UTC timestamp is genuinely required, suffix the timezone: created_at_pt. UTC remains the default.
  9. Dates end in _date.
    • hire_date, invoice_date, effective_date
  10. Event verbs are past tense. created, updated, deleted, submitted.
  11. Generic columns get the entity prefix so they remain unambiguous after joins:
    • quote_status, quote_type, quote_created_at
    • status, type, created_at (when ambiguous post-join)

6. Data types

Domain Snowflake type Notes
Money / amounts NUMBER(38,2) Never FLOAT. Use even when the source is FLOAT — cast in IL.
Counts / quantities NUMBER(38,0) or NUMBER(38,n) Match the precision the business requires.
IDs VARCHAR (string) All IDs are strings, including numeric-looking ones.
Booleans BOOLEAN Cast 'Y'/'N', 1/0, etc. to true booleans in IL.
Timestamps TIMESTAMP_TZ Stored in UTC.
Dates DATE
Text VARCHAR Unbounded by default; let Snowflake handle it.
Semi-structured VARIANT / OBJECT / ARRAY Flatten in IL where the structure is known and stable.

7. NULL and blank handling

  1. Empty strings '' become NULL at IL.
  2. String-encoded nulls ('NULL', 'null', 'N/A', '-') become NULL at IL when you can confirm they represent missing data.
  3. Trim leading and trailing whitespace from strings.
  4. Source-system sentinel values (e.g. 9999-12-31 "high date") are kept as-is if the business uses them as sentinels; otherwise convert to NULL and document the choice in the model description.
  5. Required PK columns must be NOT NULL after IL transformation. If a row would have a NULL PK, filter it out in IL and document why.

8. Soft deletes and Fivetran metadata

The current standard pattern (matches wd_hcm_worker.sql):

with
    source as (select * from {{ source('workday_hcm', 'worker') }}),

    renamed as (
        select
            id as worker_id,
            active as is_active,
            hire_date,
            termination_date,
            worker_code,
            user_id as worker_user_id,
            _fivetran_deleted,
            _fivetran_synced

        from source
    )

select * exclude (_fivetran_deleted)
from renamed
where _fivetran_deleted <> true or _fivetran_deleted is null

Rules:

  1. Always preserve _fivetran_synced as the freshness indicator. Carry it through to the IL output column unchanged.
  2. Filter out soft-deleted rows in IL using the where _fivetran_deleted <> true or _fivetran_deleted is null pattern, then exclude the column from the final select.
  3. If a downstream consumer needs deleted rows for historical reasons, build a separate IL model with _deleted in the name; do not mix live and deleted rows in a single IL output.
  4. For sources without Fivetran metadata, document the freshness column used instead (e.g. last_modified, etl_loaded_at).

9. Audit columns and source lineage

Every IL cleansing_transformations model must expose, in this order at the end of the column list:

_fivetran_synced     -- or equivalent source freshness column

Optional, encouraged for non-Fivetran sources or where the team agrees:

source_system        -- short identifier of the source, e.g. 'workday_hcm'
loaded_at            -- timestamp the data landed in Snowflake (UTC)

These are added at IL, not in PL. PL inherits them via ref().

10. Deduplication

When a source is versioned (multiple rows per business key with a freshness timestamp), deduplicate in IL using the canonical pattern:

with
    source as (select * from {{ source('foo', 'bar') }}),

    deduped as (
        select *,
               row_number() over (
                   partition by <business_key>
                   order by _fivetran_synced desc
               ) as _row_num
        from source
    )

select <columns>
from deduped
where _row_num = 1

Rules:

  1. The business key is documented in the model YAML description.
  2. The ordering column is _fivetran_synced for Fivetran sources; otherwise the source's last_modified / updated_at / equivalent.
  3. The dedup CTE name is deduped for consistency.
  4. Drop _row_num from the final select.

11. Materialization

Defaults are set in dbt_project.yml. Override only with a documented reason.

Situation Materialization
Source is small, downstream usage is light view (default)
Source is large or downstream queries are slow on a view table
Source is large and grows append-only or update-by-key incremental
Source is small but heavily reused across PL table

For incremental models:

  1. unique_key should reference the descriptive PK name (e.g. unique_key='daily_log_id'). The legacy 'id' is acceptable during a transition window per §5.2 exceptions; new hand-written incrementals should start with the descriptive name.

  2. on_schema_change must be set explicitly. Default to 'append_new_columns' unless there is a reason to fail.

  3. Incremental filters use >= not > to avoid missing records that share a timestamp with the high-water mark.

    {% if is_incremental() %}
        where _fivetran_synced >= (select max(_fivetran_synced) from {{ this }})
    {% endif %}

Salesforce _formula incremental pattern (do not edit by hand)

Salesforce cleansing models generated by salesforce_formula_utils follow a fixed pattern that the standards do not override:

{{ config(
    materialized="incremental",
    incremental_strategy="merge",
    unique_key="id",
    data-removed="append_new_columns"
) }}
{{ get_formula_model("<source_object>", incremental_this=(this if is_incremental() else none)) }}

Keep unique_key="id", keep incremental_strategy="merge", do not edit the file body — it will be overwritten by the package.

12. Required dbt tests

In the source-folder YAML, every IL model must have:

  1. Primary keyunique and not_null on the PK column. Reference the descriptive PK name (worker_id), not id.
  2. Foreign keysrelationships test to the parent IL model where the parent is also in the IL.
  3. Enumerated columnsaccepted_values for status / type columns whose set is small and stable.
  4. Not-null on required business columns — at minimum the columns the model description claims are required.

Use the modern dbt syntax (flags: require_generic_test_arguments_property: true is set in dbt_project.yml):

columns:
  - name: worker_id
    data_type: varchar
    description: "Workday worker ID. Primary key."
    data_tests:
      - unique
      - not_null

13. SQL style

Most rules are enforced by .sqlfluff. The standards doc only states conventions not already covered there:

  1. One CTE = one logical step. Name CTEs descriptively (renamed, deduped, joined_to_workers).
  2. {{ ref() }} and {{ source() }} calls go in import CTEs at the top of the file, named after the table they reference.
  3. Limit data scanned by import CTEs. Select only the columns used; apply where filters at the top, not at the bottom.
  4. Last line is select * from <final_cte> so model output can be audited by swapping the final CTE name during development.
  5. Use union all unless deduplication via union is genuinely intended.
  6. Be explicit about join type: inner join, left join, full outer join. Never bare join.
  7. Move left to right in joins. A right join usually means the FROM table is wrong.
  8. No select * in final output. Enumerate columns. select * exclude (...) is allowed when the exclusion list is short and intentional (see the soft-delete pattern above).
  9. No commented-out code in committed files. Use Jinja comments ({# … #}) for explanatory comments that should not appear in compiled SQL.

14. File header

Every .sql file in the IL begins with:

/*

Created By: <Author Name>
Created On: YYYY-MM-DD
Description: <one-sentence description of what this model is and its grain>

Updated By:
Updated On:
Description:

*/

When making non-trivial changes, append (do not overwrite) an Updated By / Updated On / Description block.

15. Documentation requirements

Where IL model YAML lives

The repo currently has IL model documentation split across two locations:

Standard for new work: put model documentation in models/_model_yamls/<source>/<source>.yaml. Migrate co-located YAML to that location as part of any meaningful PR that touches the affected models.

The gold-standard reference is models/_model_yamls/prowatch/prowatch.yaml. New IL YAML should match its quality bar.

Required content per IL model

  1. A model description that includes the grain ("one row per …") and the business key.
  2. A description on every column — not "". At minimum, describe the PK, FKs, status/type columns, all monetary columns, and all timestamp/date columns. Generic boilerplate is not acceptable.
  3. data_type on every column where the type is not obvious from a sibling IL model.
  4. The required tests from §12.

Why empty descriptions matter: dbt_project.yml sets +persist_docs: { relation: true, columns: true }, which pushes descriptions to Snowflake's COMMENT metadata. Empty descriptions surface as gaps in the Snowflake catalog and any AI/semantic tooling that reads from INFORMATION_SCHEMA.

Empty descriptions are blockers in code review.

16. Sources

Source declarations live in models/_source_yamls/<source>/. Each new source must declare:

version: 2

sources:
  - name: <source_name>
    database: edw_stage_db          # current default for Fivetran-landed sources
    schema: <schema>
    config:
      tags: ["<source_name>"]
      loaded_at_field: _fivetran_synced
      freshness:
        warn_after:  { count: 24, period: hour }
        error_after: { count: 72, period: hour }

    tables:
      - name: <table_name_in_dbt>
        identifier: <actual_table_in_snowflake>   # only when remapping

Rules:

  1. Source files live under models/_source_yamls/<source>/, never co-located with models.
  2. database and schema are set explicitly.
  3. loaded_at_field is set: _fivetran_synced for Fivetran sources, otherwise the source's freshness column (e.g. last_modified).
  4. freshness is set with warn_after and error_after thresholds calibrated to the connector's sync schedule.
  5. Source-level tags are applied for orchestration.
  6. Use identifier: to remap a source table when its raw name collides with another source (e.g. hcm_organization aliasing organization).
  7. New sources are listed in the appropriate folder; do not put a new source into an existing source's YAML file.

17. Snapshots

Snapshots live in the top-level snapshots/ directory, in YAML files named _<source>_snapshots.yml (or a _<source>_snapshots/ directory of YAML fragments for very large source families).

Canonical snapshot config:

- name: wd_hcm_worker_snapshot
  relation: source('workday_hcm', 'worker')
  config:
    tags: <source>-snapshots
    schema: SNAPSHOT
    database: "{{ target.database }}"
    unique_key: id              # source PK as it lands; rename happens at IL
    strategy: timestamp
    updated_at: _fivetran_synced
    dbt_valid_to_current: "to_date('9999-12-31')"

Rules:

  1. Snapshots read from source(...), not from IL models.
  2. Snapshot model names are <source_abbrev>_<entity>_snapshot.
  3. schema: SNAPSHOT and database: "{{ target.database }}".
  4. Strategy is timestamp; updated_at is _fivetran_synced for Fivetran sources, otherwise the source's freshness column.
  5. dbt_valid_to_current: "to_date('9999-12-31')" is the team's standard sentinel for the current row.
  6. unique_key: id is acceptable here even though IL models rename id — snapshots predate the rename and operate on the source as it landed.
  7. Composite keys are written as a YAML list: unique_key: [worker_id, manager_id, position_id].
  8. Tags follow the pattern <source>-snapshots (e.g. hcm-snapshots).

18. Use existing macros before writing inline logic

The macros/ folder already contains utilities that should be reused instead of re-implementing inline in models. Always check macros/ first.

Macro When to use
norm_str(expr) Case-insensitive normalization for joins / comparisons. Returns upper(trim(...)).
norm_key(parts) Building composite natural keys for dedup or surrogate-key inputs.
json_value(column, path, cast_type) Extracting paths from VARIANT/JSON columns. Use this instead of writing raw (col:path)::type casts inline.
with_unknown_row(model_ref, column_values) Adding an "Unknown" sentinel row to a dimension. PL use; mention in IL only when an IL dim genuinely needs one.
quoted_column_not_null Custom dbt test for columns whose names need quoting (reserved words, special characters).
convert_to_est(column_name) UTC → America/New_York conversion. Never use in IL — IL stays UTC. PL only, when a stakeholder explicitly wants ET.

From installed packages:

If you find yourself writing inline column-cleansing logic that feels repeatable across more than one model, raise a follow-up to add it as a macro rather than copy-pasting.

19. PR expectations

This section extends, not replaces, .github/pull_request_template.md. A PR that adds or modifies an IL model must include:

  1. The .sql file with header and SQL conforming to this document.
  2. The corresponding YAML entry in models/_model_yamls/<source>/<source>.yaml with description, columns, data_types, and tests (see §15).
  3. A green dbt build --select state:modified+ locally.
  4. No select * in final output (see §13.8).
  5. A note in the PR description listing the model name, the grain, and the primary key.
  6. The il_model_review_checklist.md is filled out and pasted into the PR description, or referenced as already reviewed.

20. What this document does not cover (yet)

The following are intentionally out of scope for v1 and will be addressed in follow-up PRs:

21. Change history

Version Date Author Notes
v1.1 (draft) 2026-06-17 Steve Farmer Repo survey integration: added §0 (relationship to .github/dbt_style_guide.md), §16 Sources, §17 Snapshots, §18 Existing macros. Updated §4 (Salesforce _formula exception, IL fact/dim suffix style, PL naming deferred), §5 (id PK exception list), §11 (Salesforce formula incremental pattern), §15 (_model_yamls/ location, prowatch as gold-standard, +persist_docs rationale), §19 (PR template cross-reference), §20 (sources/macros removed; PL naming added).
v1 (draft) 2026-06-16 Steve Farmer Initial draft for team review.