37 NRRD: Spacing and Origin

Let me build the intuition first, then connect it to the PyTorch ecosystem.

37.1 What spacing and origin actually encode

A NRRD file (like DICOM, NIfTI, MHA) stores two different things:

A raw voxel array — a grid of intensity numbers, indexed by (i, j, k).
Spatial metadata that tells you how that grid sits in the physical world (the patient’s body, in millimeters).

   Voxel grid (array indices)          Physical/patient space (mm)
   ┌──┬──┬──┬──┐                        y(mm)
   │  │  │  │  │   i,j,k                 ▲
   ├──┼──┼──┼──┤   integers   ───────►   │   ╱ patient anatomy
   │  │  │  │  │                         │  ╱  (real cm/mm)
   └──┴──┴──┴──┘                         └──────────► x(mm)
        ▲
   "just numbers"                   "where is this in the body?"

The metadata that bridges these two worlds:

Spacing — the physical size of one voxel along each axis, in mm. E.g. (0.7, 0.7, 3.0) means each voxel is 0.7 mm wide/tall but 3 mm thick. This is extremely common in CT/MR — voxels are anisotropic (not cubes).
Origin (NRRD calls it space origin) — the physical coordinate (x, y, z) in mm of the very first voxel (0,0,0).
Direction / space directions — the orientation (which way the rows/columns point in the patient: left/right, anterior/posterior, etc.). Origin + spacing + direction together form the affine matrix.

The whole thing collapses into one equation:

   ⎡x⎤   ⎡         ⎤ ⎡i⎤   ⎡origin_x⎤
   ⎢y⎥ = ⎢ direction⎥⎢j⎥ + ⎢origin_y⎥        (with spacing baked into "direction")
   ⎣z⎦   ⎢  × spacing⎥⎣k⎦   ⎣origin_z⎦
         ⎣         ⎦

   voxel index ──► physical position in the patient (mm)

That affine is the entire point of the metadata: it’s the dictionary that translates “array element” ↔︎ “place in the body.”

37.2 Why it matters for an AI pipeline

Here are the concrete failure modes — these are real bugs, not theory.

1. Anisotropic voxels distort anatomy if you treat the array naively. If you feed a 512×512×60 array straight into a CNN and assume it’s a cube, a 3 mm-thick slice gets treated as if it were the same physical size as a 0.7 mm in-plane pixel. A sphere becomes a squashed ellipsoid in “voxel space.”

   True anatomy (mm)        Naive array view (voxels)
        ●  round nodule          ▬▬▬  same nodule looks flat
       ╱ ╲                        because z-spacing (3mm)
      ●   ●                       ≠ xy-spacing (0.7mm)
       ╲ ╱
        ●

The network learns the wrong geometry, and it won’t generalize across scanners that use different slice thicknesses.

2. Scanner/protocol harmonization. Your training data comes from many scanners — GE, Siemens, Philips — each with different spacing. A model that memorizes “tumor = 20 voxels” is meaningless when 20 voxels means 14 mm on one scanner and 60 mm on another. The standard fix is to resample everything to a canonical spacing (e.g. 1×1×1 mm) so the network sees consistent physical scale. You literally cannot do that without the spacing field.

3. Any physical measurement depends on it. This is the one radiologists care about most. Volume, RECIST diameter, distances:

   tumor volume = (#voxels in mask) × (spacing_x × spacing_y × spacing_z)
                                    └──────── voxel volume in mm³ ───────┘

A perfect segmentation with the wrong spacing gives the wrong volume → wrong clinical report. The mask is “correct” in voxel space and useless in patient space.

4. Image ↔︎ label alignment. Your CT and its segmentation mask (or a second MR sequence) must share the same origin/spacing/direction to overlay correctly. A tiny origin mismatch silently shifts the mask off the anatomy, and your Dice score tanks for a reason that has nothing to do with the model.

   image origin = (-150, -150, -40)     mask origin = (-150, -150,  0)
   ┌──────────────┐                     ┌──────────────┐
   │   liver      │        overlay      │              │ ← mask shifted
   │  ▓▓▓▓        │        ─────►       │      ▓▓▓▓    │   40mm in z!
   └──────────────┘                     └──────────────┘

5. Inference must map back to patient space. After your model predicts a mask, you have to write it back out with the original spacing and origin so it lands on the right place in the PACS viewer. If you discard the affine during preprocessing, the prediction floats in the wrong location and a radiologist can’t trust it.

37.3 Where this lives in the PyTorch ecosystem

You generally won’t hand-roll the affine math — the medical DL libraries handle it, but only because you preserve the metadata. The key players:

SimpleITK / nibabel — load NRRD/NIfTI with spacing, origin, direction (don’t load as a raw numpy array and throw the header away — that’s the classic beginner mistake).
MONAI — the de-facto medical DL library on top of PyTorch. Its transforms are spacing-aware:

   LoadImaged          → reads array + affine (keeps metadata)
   Orientationd        → standardize direction (e.g. to RAS)
   Spacingd            → resample to canonical spacing, e.g. (1,1,1)
   ... normal training ...
   Invertd             → undo the transforms, write prediction
                          back in the ORIGINAL spacing/origin

TorchIO — similar philosophy, carries the affine through augmentations so rotations/flips happen in physical space.

The mental model to lock in: the voxel array is only half the data. The spacing and origin are what make those numbers medical rather than just a 3D tensor. Strip them out and you’ve thrown away the patient’s geometry — which is usually the thing you’re trying to measure.

A good habit from day one: whenever you load a volume, print image.GetSpacing(), image.GetOrigin(), image.GetDirection() and sanity-check them before anything else.