8  OMOP CDM SQL Files Explanation

This document explains the SQL files generated by the CommonDataModel R package for OMOP CDM v5.4 PostgreSQL implementation.

8.1 File Overview

The db/5.4/postgresql/ directory contains 4 essential SQL files that work together to create a complete OMOP CDM database:

db/5.4/postgresql/
 OMOPCDM_postgresql_5.4_ddl.sql           (32KB) - Table definitions
 OMOPCDM_postgresql_5.4_primary_keys.sql  (3KB)  - Primary key constraints
 OMOPCDM_postgresql_5.4_indices.sql       (10KB) - Database indexes
 OMOPCDM_postgresql_5.4_constraints.sql   (32KB) - Foreign key constraints

8.2 1. DDL File (OMOPCDM_postgresql_5.4_ddl.sql)

Creates all OMOP CDM tables using Data Definition Language (DDL). This is the foundation that defines the structure of your healthcare data warehouse.

8.2.1 Key SQL Syntax Elements

8.2.1.1 Table Creation Structure

CREATE TABLE @cdmDatabaseSchema.person (
    person_id integer NOT NULL,
    gender_concept_id integer NOT NULL,
    year_of_birth integer NOT NULL,
    month_of_birth integer NULL,
    day_of_birth integer NULL,
    birth_datetime TIMESTAMP NULL,
    -- more columns...
);

8.2.1.2 Important PostgreSQL-Specific Features

  • Schema Placeholder: @cdmDatabaseSchema gets replaced with actual schema name during execution
  • Data Types:
    • integer for IDs and numeric values
    • varchar(n) for text with length limits
    • date for dates without time
    • TIMESTAMP for datetime values
    • NUMERIC for precise decimal values
    • TEXT for unlimited text
  • Null Constraints: NOT NULL ensures required fields are populated
  • Distribution Hints: --HINT DISTRIBUTE ON KEY (person_id) optimizes for distributed databases

8.2.2 Table Categories

8.2.2.1 1. Clinical Data Tables (Patient-Centric)

  • person: Demographics and basic patient information
  • observation_period: Time periods when patients are observable in the data
  • visit_occurrence: Hospital visits, appointments, encounters
  • visit_detail: Sub-components of visits (e.g., ICU stay within hospitalization)
  • condition_occurrence: Diagnoses and medical conditions
  • drug_exposure: Medications and drug administrations
  • procedure_occurrence: Medical procedures and interventions
  • device_exposure: Medical devices used
  • measurement: Lab results, vital signs, diagnostic measurements
  • observation: Clinical observations and assessments
  • death: Death information
  • note: Clinical notes and text documents
  • specimen: Biological specimens collected

8.2.2.2 2. Health System Tables

  • location: Geographic locations
  • care_site: Healthcare facilities and departments
  • provider: Healthcare providers (doctors, nurses, etc.)

8.2.2.3 3. Health Economics Tables

  • payer_plan_period: Insurance coverage periods
  • cost: Healthcare costs and billing information

8.2.2.4 4. Derived Data Tables

  • drug_era: Continuous periods of drug exposure
  • dose_era: Periods of consistent drug dosing
  • condition_era: Continuous periods of condition occurrence
  • episode: Clinical episodes grouping related events

8.2.2.5 5. Vocabulary Tables (Standardization)

  • concept: Central vocabulary concepts (standardized terms)
  • vocabulary: Different terminology systems (ICD-10, SNOMED, etc.)
  • domain: Clinical domains (Drug, Condition, Procedure, etc.)
  • concept_class: Classifications within domains
  • concept_relationship: Relationships between concepts
  • concept_ancestor: Hierarchical relationships
  • source_to_concept_map: Mapping source codes to standard concepts

8.2.2.6 6. Metadata and Research Tables

  • cdm_source: Information about the CDM instance
  • metadata: Metadata about the database
  • cohort: Research cohorts
  • cohort_definition: Cohort definitions

8.3 2. Primary Keys File (OMOPCDM_postgresql_5.4_primary_keys.sql)

Adds primary key constraints to ensure unique record identification and data integrity.

8.3.1 SQL Syntax

ALTER TABLE @cdmDatabaseSchema.person
ADD CONSTRAINT xpk_person PRIMARY KEY (person_id);

8.3.2 What Primary Keys Do

  • Uniqueness: Ensures no duplicate records
  • Performance: Automatically creates an index for fast lookups
  • Referential Integrity: Enables foreign key relationships
  • Naming Convention: All constraints named with xpk_ prefix

8.3.3 Key Examples

  • person_id in person table
  • visit_occurrence_id in visit_occurrence table
  • condition_occurrence_id in condition_occurrence table
  • concept_id in concept table

8.4 3. Indices File (OMOPCDM_postgresql_5.4_indices.sql)

8.4.1 Purpose

Creates database indexes to dramatically improve query performance, especially for large datasets.

8.4.2 Key SQL Syntax

8.4.2.1 Standard Index Creation

CREATE INDEX idx_person_id ON @cdmDatabaseSchema.person (person_id ASC);

8.4.2.2 PostgreSQL Clustering (Physical Data Organization)

CLUSTER @cdmDatabaseSchema.person USING idx_person_id;

8.4.3 Index Strategy by Table Type

8.4.3.1 Patient-Centric Tables

-- Clustered by person_id for patient-level queries
CREATE INDEX idx_condition_person_id_1 ON condition_occurrence (person_id ASC);
CLUSTER condition_occurrence USING idx_condition_person_id_1;

-- Additional indexes for common queries
CREATE INDEX idx_condition_concept_id_1 ON condition_occurrence (condition_concept_id ASC);
CREATE INDEX idx_condition_visit_id_1 ON condition_occurrence (visit_occurrence_id ASC);

8.4.3.2 Vocabulary Tables

-- Clustered by primary concept lookups
CREATE INDEX idx_concept_concept_id ON concept (concept_id ASC);
CLUSTER concept USING idx_concept_concept_id;

-- Additional indexes for vocabulary searches
CREATE INDEX idx_concept_code ON concept (concept_code ASC);
CREATE INDEX idx_concept_vocabulary_id ON concept (vocabulary_id ASC);

8.4.4 Performance Benefits

  • Clustered Indexes: Physically reorder table data for optimal access patterns
  • Foreign Key Indexes: Speed up joins between tables
  • Query-Specific Indexes: Support common analytical queries

8.5 4. Constraints File (OMOPCDM_postgresql_5.4_constraints.sql)

8.5.1 Purpose

Enforces referential integrity through foreign key relationships, ensuring data consistency across the entire database.

8.5.2 SQL Syntax

ALTER TABLE @cdmDatabaseSchema.person
ADD CONSTRAINT fpk_person_gender_concept_id
FOREIGN KEY (gender_concept_id)
REFERENCES @cdmDatabaseSchema.CONCEPT (CONCEPT_ID);

8.5.3 Key Relationship Patterns

8.5.3.2 2. Patient-Centric Design

-- All clinical events link back to patients
ALTER TABLE visit_occurrence ADD CONSTRAINT fpk_visit_occurrence_person_id
FOREIGN KEY (person_id) REFERENCES PERSON (PERSON_ID);

ALTER TABLE condition_occurrence ADD CONSTRAINT fpk_condition_occurrence_person_id
FOREIGN KEY (person_id) REFERENCES PERSON (PERSON_ID);

8.5.3.3 3. Visit Hierarchy

-- Clinical events can be linked to visits
ALTER TABLE condition_occurrence ADD CONSTRAINT fpk_condition_occurrence_visit_occurrence_id
FOREIGN KEY (visit_occurrence_id) REFERENCES VISIT_OCCURRENCE (VISIT_OCCURRENCE_ID);

-- Visit details link to main visits
ALTER TABLE visit_detail ADD CONSTRAINT fpk_visit_detail_visit_occurrence_id
FOREIGN KEY (visit_occurrence_id) REFERENCES VISIT_OCCURRENCE (VISIT_OCCURRENCE_ID);

8.5.3.4 4. Provider and Care Site Tracking

-- Events can be linked to providers and care sites
ALTER TABLE condition_occurrence ADD CONSTRAINT fpk_condition_occurrence_provider_id
FOREIGN KEY (provider_id) REFERENCES PROVIDER (PROVIDER_ID);

ALTER TABLE visit_occurrence ADD CONSTRAINT fpk_visit_occurrence_care_site_id
FOREIGN KEY (care_site_id) REFERENCES CARE_SITE (CARE_SITE_ID);

8.6 Execution Order

These files must be executed in the correct sequence:

  1. DDL � Create all table structures
  2. Primary Keys � Add unique constraints and primary indexes
  3. Indices � Add performance optimization indexes
  4. Constraints � Enforce referential integrity
-- Example execution in R (from your cdm-build.qmd)
CommonDataModel::executeDdl(
    connectionDetails = cd,
    cdmVersion = "5.4",
    cdmDatabaseSchema = "ohdsi_demo"
)

8.7 PostgreSQL-Specific Features

8.7.1 Data Types

  • TIMESTAMP for precise datetime storage
  • NUMERIC for exact decimal arithmetic
  • TEXT for unlimited text length
  • varchar(n) for limited text with specific length

8.7.2 Performance Optimizations

  • CLUSTER commands physically reorder data
  • Strategic indexing on person_id for patient-centric queries
  • Vocabulary indexes for concept lookups
  • Distribution hints for parallel processing systems

8.7.3 Constraints and Integrity

  • Comprehensive foreign key relationships
  • Named constraints for easy identification
  • Cascade options for data management

8.8 OMOP CDM Design Principles

8.8.1 1. Standardization

  • All clinical concepts mapped to standard vocabularies
  • Consistent data types and naming conventions
  • Uniform structure across institutions

8.8.2 2. Patient-Centricity

  • Person table as the core entity
  • All clinical events linked to patients
  • Longitudinal view of patient data

8.8.3 3. Flexibility

  • Source value fields preserve original data
  • Multiple concept fields (standard + source)
  • Extensible vocabulary system

8.8.4 4. Performance

  • Optimized for analytical queries
  • Strategic indexing and clustering
  • Efficient joins and aggregations

8.9 Common Query Patterns

8.9.1 Patient Demographics

SELECT
    p.person_id,
    c_gender.concept_name AS gender,
    c_race.concept_name AS race,
    p.year_of_birth
FROM person p
JOIN concept c_gender ON p.gender_concept_id = c_gender.concept_id
JOIN concept c_race ON p.race_concept_id = c_race.concept_id;

8.9.2 Patient Conditions

SELECT
    p.person_id,
    c.concept_name AS condition_name,
    co.condition_start_date
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.domain_id = 'Condition';

8.9.3 Drug Exposures

SELECT
    p.person_id,
    c.concept_name AS drug_name,
    de.drug_exposure_start_date,
    de.days_supply
FROM person p
JOIN drug_exposure de ON p.person_id = de.person_id
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.domain_id = 'Drug';

This structure provides a robust foundation for healthcare data analysis, enabling standardized research across different healthcare systems and institutions.