5. Architecture and Design

This section provides an overview of the architecture and design principles behind HepSW. HepSW is built around several key concepts that ensure transparency, reproducibility, and adaptability in building HEP software stacks from source. We took inspiration from existing build systems and package managers, but focused on a source-first approach that emphasizes explicitness and clarity.

5.1. System Overview

HepSW orchestrates the build process through several interconnected components:

digraph system_overview { rankdir=LR; node [shape=box, style="rounded,filled", fillcolor=lightblue]; manifest [label="Package\nManifests", fillcolor=lightgreen]; parser [label="Manifest\nParser"]; resolver [label="Dependency\nResolver"]; fetcher [label="Source\nFetcher"]; builder [label="Build\nEngine"]; installer [label="Install\nManager"]; envgen [label="Environment\nGenerator"]; manifest -> parser; parser -> resolver; resolver -> fetcher; fetcher -> builder; builder -> installer; installer -> envgen; cache [label="Source Cache", shape=cylinder, fillcolor=lightyellow]; workspace [label="Workspace", shape=folder, fillcolor=lightyellow]; fetcher -> cache [style=dashed, label="reuse"]; builder -> workspace; installer -> workspace; envgen -> workspace; }

The main components of HepSW’s architecture include:

  • Manifests: Each package is defined by a YAML document describing metadata, dependencies, build instructions, and configuration options

  • Dependency Management: Robust system tracking relationships between packages, versions, and constraints with SAT-based conflict resolution

  • Build Engine: Orchestrates the build process based on manifest information, retrieving sources and executing builds in controlled environments

  • Environment Management: Tools for managing software environments including variables, paths, and configurations for seamless integration

Each of these components is discussed in detail in the following sections.

5.2. Core Principles

HepSW’s core principles are centered around the needs of physicists using HEP software and contributors to HEP projects:

digraph core_principles { node [shape=box, style="rounded,filled", fillcolor=lightcyan]; hepsw [label="HepSW", shape=ellipse, fillcolor=lightgreen]; source [label="Source First"]; repro [label="Reproducibility"]; adapt [label="Adaptability"]; mod [label="Modularity"]; hepsw -> source; hepsw -> repro; hepsw -> adapt; hepsw -> mod; }

5.2.1. Why These Principles Matter

Source-first means every build is traceable and auditable. When a build fails, you can inspect the exact commands run, the source code used, and the environment it ran in. This transparency is essential for debugging complex dependency issues common in HEP stacks. Unlike binary distributions that hide the build process, HepSW makes every step explicit and inspectable.

Reproducibility means the same manifest should produce identical results on different machines. HepSW achieves this through explicit dependency versions, deterministic build flags, and isolated build environments. This is critical for collaboration—if it builds on your laptop, it should build on your colleague’s workstation, whether they’re at a university, national lab, or working remotely.

Adaptability acknowledges that HEP software must work across diverse computing environments: university clusters, national labs, cloud platforms, and developer laptops. HepSW doesn’t assume a specific OS distribution, filesystem layout, or centralized infrastructure like LxPlus or CVMFS. This is especially important for distributed collaborations like FCC and DUNE where team members work from institutions around the world.

Modularity means packages are independently buildable. You shouldn’t need to build all of ROOT to get Geant4 working. Users install only what they need, and developers can test changes to individual packages without rebuilding the world. This also enables faster iteration during development and testing.

5.2.2. Why Source-First?

Building software from source seems fragile and unreliable at first glance, especially when compared to binary distributions or package managers that provide pre-built packages. HepSW embraces this approach for several critical reasons:

1. Solving Real Collaboration Problems

Large HEP experiments like FCC and DUNE need consistent software environments across distributed teams working at different institutions worldwide. Not everyone has access to CERN infrastructure (LxPlus, CVMFS), and even those who do often need local development environments for testing and debugging. HepSW provides that consistency without requiring centralized infrastructure or institutional access.

2. Addressing Binary Distribution Fragmentation

Currently, most HEP software lacks coordinated binary distribution. When binaries exist, they’re fragmented across conda-forge, CVMFS, experiment-specific repositories, and ad-hoc institutional builds. Different experiments maintain their own binary stacks with varying degrees of compatibility. HepSW doesn’t replace these systems—it provides the build recipes that could unify them while letting users build locally when binaries aren’t available or don’t match their needs.

3. Enabling Development and Testing

HEP software stacks are complex and constantly evolving. Developers need to test their code against various dependency versions and configurations. A new update in Key4HEP should be testable against the entire stack immediately, ensuring that breaking changes are caught before release. HepSW guarantees a clean build and testing procedure out-of-the-box, essential for continuous integration and validation.

4. Transparency and Control

Relying on pre-built binaries can lead to compatibility issues, version mismatches, and hidden dependencies. When something breaks, debugging becomes difficult because you can’t see how the software was built or what compilation flags were used. By building from source, HepSW ensures that users have full control over the build process, allowing them to adapt to changes in the software ecosystem and maintain reproducibility across different environments.

5.3. Component Details

5.3.1. Manifests: The Single Source of Truth

Each package in HepSW is defined by a YAML manifest that answers four fundamental questions:

  1. What is this package? (name, version, description)

  2. Where does it come from? (git repository, tarball URL, tag/commit)

  3. How do you build it? (configure, compile, install commands)

  4. What does it need? (dependencies with version constraints)

5.3.1.1. Minimal Manifest Example

name: example-package
version: 1.0.0
description: Example HEP analysis package

source:
  type: git
  url: https://github.com/hep-org/example
  tag: v1.0.0

dependencies:
  - name: cmake
    version: ">=3.20"
  - name: root
    version: "^6.28.0"

build:
  configure: cmake -B build -DCMAKE_INSTALL_PREFIX=$PREFIX
  compile: cmake --build build -j$JOBS
  install: cmake --install build

environment:
  PATH: $PREFIX/bin
  LD_LIBRARY_PATH: $PREFIX/lib

5.3.1.2. Manifest with Build Options

HepSW supports configurable builds through user-selectable options. These options are discovered automatically by analyzing CMake projects (via Seemake) and can be specified during installation:

name: root
version: 6.30.02
description: CERN ROOT Data Analysis Framework

source:
  type: git
  url: https://github.com/root-project/root.git
  tag: v6-30-02

dependencies:
  - name: cmake
    version: ">=3.20"
  - name: python
    version: ">=3.8"
    optional: false

options:
  - name: builtin_openssl
    description: Build OpenSSL internally
    default: ON
    type: bool
  - name: pyroot
    description: Enable Python bindings
    default: ON
    type: bool
  - name: tmva
    description: Enable TMVA machine learning
    default: ON
    type: bool

build:
  configure: |
    cmake -S . -B build \
      -DCMAKE_INSTALL_PREFIX=$PREFIX \
      -Dbuiltin_openssl=$OPT_builtin_openssl \
      -Dpyroot=$OPT_pyroot \
      -Dtmva=$OPT_tmva
  compile: cmake --build build -j$JOBS
  install: cmake --install build

environment:
  PATH: $PREFIX/bin
  LD_LIBRARY_PATH: $PREFIX/lib
  PYTHONPATH: $PREFIX/lib
  ROOTSYS: $PREFIX

When installing, users can specify options:

hepsw build root --options pyroot=ON,tmva=OFF

For project-specific workflows, HepSW can suggest optimal configurations:

hepsw build root --project fcc
# Automatically enables FCC-relevant options

5.3.1.3. Manifest Storage

Manifests are stored separately from HepSW itself in the hepsw-package-index repository. This design allows:

  • Community contributions without modifying HepSW core

  • Version control of package definitions

  • Independent evolution of packages and tool

  • Easy forking for experiment-specific stacks

5.3.2. Dependency Management: Handling Version Constraints

HepSW’s dependency resolver is designed to handle the complex version relationships common in HEP software stacks:

digraph dependency_resolution { rankdir=TB; node [shape=box, style="rounded,filled", fillcolor=lightblue]; request [label="User Request\n(root, geant4)", fillcolor=lightgreen]; parse [label="Parse Manifests"]; graph [label="Build Dependency\nGraph"]; conflicts [label="Check Version\nConflicts", shape=diamond, fillcolor=lightyellow]; sat [label="SAT Solver\nResolution"]; multi [label="Build Multiple\nVersions"]; order [label="Topological Sort\n(Build Order)"]; build [label="Execute Builds", fillcolor=lightcoral]; request -> parse; parse -> graph; graph -> conflicts; conflicts -> sat [label="conflicts"]; conflicts -> order [label="compatible"]; sat -> multi [label="no solution"]; sat -> order [label="resolved"]; multi -> order; order -> build; }

5.3.2.1. Version Constraint Syntax

HepSW supports semantic versioning with standard constraint operators:

  • >=3.20 - Minimum version (inclusive)

  • ~1.2 - Patch version updates allowed (1.2.0, 1.2.1, but not 1.3.0)

  • ^2.0 - Minor version updates allowed (2.0.0, 2.1.0, but not 3.0.0)

  • ==6.28.0 - Exact version required

  • >=3.0,<4.0 - Range specification

5.3.2.2. Resolution Strategy

The resolver operates in several phases:

  1. Graph Construction: Parse all manifests for requested packages and recursively collect dependencies

  2. Constraint Collection: Gather all version constraints from the dependency tree

  3. Conflict Detection: Identify cases where different packages require incompatible versions

  4. Resolution:

    • If all constraints are compatible: select highest compatible version for each package

    • If conflicts exist: invoke SAT solver to find compatible version set

    • If no solution exists: build multiple versions of conflicting packages in isolated prefixes

  5. Build Ordering: Topological sort ensures dependencies are built before dependents

5.3.2.3. Handling Conflicting Dependencies

Example scenario:

  • User wants ROOT 6.30 + Geant4 11.2

  • ROOT 6.30 requires Python >=3.9,<3.12

  • Geant4 11.2 requires Python >=3.11

Resolution approach:

  1. SAT solver determines Python 3.11 satisfies both constraints

  2. Single Python 3.11 installation is built

  3. Both ROOT and Geant4 link against this shared Python

Example unresolvable conflict:

  • Package A requires OpenSSL 1.1.x

  • Package B requires OpenSSL 3.x

Resolution approach:

  1. SAT solver detects incompatibility

  2. HepSW builds both OpenSSL 1.1 and OpenSSL 3.x in separate prefixes

  3. Package A links against OpenSSL 1.1, Package B against OpenSSL 3.x

  4. User is warned about the dual installation

5.3.2.4. Optional Dependencies

Some package features are only enabled if certain dependencies are present:

dependencies:
  - name: python
    version: ">=3.8"
    optional: false  # Required
  - name: cuda
    version: ">=11.0"
    optional: true   # Optional, enables GPU support

Users can control optional dependencies:

# Enable optional dependencies
hepsw build root --with-optional

# Disable specific optional dependency
hepsw build root --without cuda

5.3.3. Build Engine: Orchestrating the Build

The build engine executes package builds in distinct, logged phases:

digraph build_phases { rankdir=TB; node [shape=box, style="rounded,filled", fillcolor=lightblue]; start [label="Start Build", shape=ellipse, fillcolor=lightgreen]; fetch [label="Fetch Phase\nDownload Sources"]; configure [label="Configure Phase\nPrepare Build"]; compile [label="Compile Phase\nBuild Binaries"]; install [label="Install Phase\nCopy to Prefix"]; register [label="Register Phase\nRecord Metadata"]; done [label="Build Complete", shape=ellipse, fillcolor=lightgreen]; fail [label="Build Failed", shape=ellipse, fillcolor=lightcoral]; start -> fetch; fetch -> configure [label="success"]; fetch -> fail [label="error"]; configure -> compile [label="success"]; configure -> fail [label="error"]; compile -> install [label="success"]; compile -> fail [label="error"]; install -> register [label="success"]; install -> fail [label="error"]; register -> done; }

5.3.3.1. Build Phases in Detail

1. Fetch Phase

  • Downloads source code to workspace/sources/<package>/<version>/

  • Supports: git repositories (tags, branches, commits), tarballs (HTTP/HTTPS), local paths

  • Uses source cache to avoid redundant downloads

  • Verifies checksums if provided in manifest

2. Configure Phase

  • Prepares the build system (CMake, Autotools, etc.)

  • Sets up environment variables:

    • $PREFIX: Installation target directory

    • $JOBS: Parallel build jobs (from -j flag)

    • $CMAKE_PREFIX_PATH: Paths to dependencies

    • $PKG_CONFIG_PATH: For pkg-config detection

  • Runs configuration commands from manifest

  • Creates build directory in workspace/builds/<package>/<version>/

3. Compile Phase

  • Executes compilation commands

  • Supports parallel builds (defaults to number of CPU cores)

  • Streams output to both terminal and log file

  • Can be interrupted and resumed

4. Install Phase

  • Installs built artifacts to isolated prefix: workspace/install/<package>/<version>/

  • Standard directory structure:

    install/<package>/<version>/
    ├── bin/          # Executables
    ├── lib/          # Libraries
    ├── include/      # Headers
    ├── share/        # Data files
    └── etc/          # Configuration
    
  • No system directories touched (no /usr/local, no sudo required)

5. Register Phase

  • Records build metadata (timestamp, options used, dependency versions)

  • Generates environment scripts

  • Updates package database for dependency tracking

5.3.3.2. Key Design Decisions

Isolated Prefixes Each package/version combination gets its own installation directory. This enables:

  • Multiple versions of the same package to coexist

  • Clean uninstallation (just delete the directory)

  • No conflicts between packages

  • Easy binary distribution (tarball the prefix)

No Sudo Required Everything installs to user-writable workspace directories. This is critical for:

  • Cluster environments where users lack root

  • Development workflows where iteration is frequent

  • Reproducibility (no system state modifications)

Parallel Builds Respects -j flag for parallel compilation:

hepsw build root -j8  # Use 8 parallel jobs

Defaults to nproc (number of CPU cores) if not specified.

Incremental Builds HepSW detects when sources haven’t changed and can skip redundant work:

hepsw build root --incremental  # Reuse existing build

Comprehensive Logging Each build phase logs to workspace/logs/<package>-<version>-<phase>.log:

logs/
├── root-6.30.02-fetch.log
├── root-6.30.02-configure.log
├── root-6.30.02-compile.log
├── root-6.30.02-install.log
└── root-6.30.02-register.log

This makes debugging much easier—you can pinpoint exactly which phase failed and why.

5.3.4. Workspace Layout: Where Everything Lives

The workspace is the central organizational structure for all HepSW operations:

digraph workspace_layout { rankdir=TB; node [shape=folder, style=filled, fillcolor=lightyellow]; workspace [label="workspace/", fillcolor=lightgreen]; sources [label="sources/\nDownloaded source code"]; builds [label="builds/\nBuild directories (temporary)"]; install [label="install/\nInstalled software"]; env [label="env/\nEnvironment scripts"]; logs [label="logs/\nBuild logs"]; cache [label="cache/\nDownloaded tarballs, git repos"]; workspace -> sources; workspace -> builds; workspace -> install; workspace -> env; workspace -> logs; workspace -> cache; pkg_sources [label="<package>/<version>/", shape=box]; sources -> pkg_sources; pkg_builds [label="<package>/<version>/", shape=box]; builds -> pkg_builds; pkg_install [label="<package>/<version>/\n bin/\n lib/\n include/\n share/", shape=box]; install -> pkg_install; }

5.3.4.1. Directory Structure

workspace/
├── sources/              # Downloaded source code
│   └── <package>/
│       └── <version>/    # e.g., root/6.30.02/
├── builds/               # Build directories (can be cleaned)
│   └── <package>/
│       └── <version>/
├── install/              # Installed software (isolated by package/version)
│   └── <package>/
│       └── <version>/
│           ├── bin/      # Executables
│           ├── lib/      # Libraries
│           ├── include/  # Headers
│           └── share/    # Data files, docs
├── env/                  # Generated environment scripts
│   ├── root-6.30.02.sh
│   └── geant4-11.2.0.sh
├── logs/                 # Build logs for debugging
│   ├── root-6.30.02-configure.log
│   ├── root-6.30.02-compile.log
│   └── root-6.30.02-install.log
└── cache/                # Downloaded tarballs, git repos (shared across builds)
    ├── tarballs/
    └── git/

5.3.4.2. Why This Layout?

Multiple Versions Coexist install/root/6.28.0 and install/root/6.30.0 can exist side-by-side. Users choose which to activate via environment scripts.

Clean Rebuilds Delete builds/ directory without losing installed software. Source code in sources/ can also be cleaned after successful builds.

Disk Space Management

  • cache/ persists across builds to avoid redownloading

  • sources/ and builds/ are ephemeral (can be cleaned)

  • install/ contains only what you actually use

Debugging Support Every build has its own log file. If compilation fails, inspect the exact error:

cat workspace/logs/root-6.30.02-compile.log

Portability The entire workspace can be tarred up and moved to another machine (as long as the OS/architecture match):

tar czf my-hep-stack.tar.gz workspace/

5.3.5. Environment Management: Making Software Usable

After building, software needs to be discoverable by the shell and other tools. HepSW generates environment scripts that configure the necessary variables:

digraph environment_generation { rankdir=LR; node [shape=box, style="rounded,filled", fillcolor=lightblue]; installed [label="Installed\nPackages", fillcolor=lightgreen]; metadata [label="Package\nMetadata"]; template [label="Environment\nTemplate"]; generate [label="Script\nGenerator"]; script [label="Shell Script", shape=note, fillcolor=lightyellow]; installed -> metadata; metadata -> generate; template -> generate; generate -> script; }

5.3.5.1. Generated Environment Scripts

For a single package:

# workspace/env/root-6.30.02.sh
export PATH="/path/to/workspace/install/root/6.30.02/bin:$PATH"
export LD_LIBRARY_PATH="/path/to/workspace/install/root/6.30.02/lib:$LD_LIBRARY_PATH"
export PYTHONPATH="/path/to/workspace/install/root/6.30.02/lib:$PYTHONPATH"
export ROOTSYS="/path/to/workspace/install/root/6.30.02"
export CMAKE_PREFIX_PATH="/path/to/workspace/install/root/6.30.02:$CMAKE_PREFIX_PATH"

5.3.5.2. Usage Patterns

Single Package Environment

source $(hepsw env path root)
root -b  # Now works

Combined Environment

hepsw env generate --packages root,geant4,pythia8 > my-analysis.sh
source my-analysis.sh

Named Environments

# Create named environment for specific workflow
hepsw env create fcc-analysis --packages root,geant4,pythia8,fastjet

# Later, activate it
hepsw env activate fcc-analysis

Project-Specific Environments HepSW can suggest packages for known projects:

hepsw env create --project fcc
# Suggests and installs: Key4HEP stack, FCCSW, Gaudi, etc.

hepsw env create --project dune
# Suggests and installs: LArSoft, ROOT, Geant4, etc.

5.3.5.3. Design Philosophy: Simple is Better

We use plain shell scripts instead of complex module systems (Environment Modules, Lmod) because:

  1. Transparency: You can read and understand exactly what the script does

  2. Debuggability: Easy to trace issues (set -x, inspect variables)

  3. Portability: Works on any shell (bash, zsh, dash)

  4. No Dependencies: No additional tools required

  5. Integration: Advanced users can integrate with existing module systems if desired

If your institution uses module systems, HepSW can generate module files:

hepsw env generate --format modulefile --package root > root/6.30.02

5.4. Comparison with Other Tools

HepSW is not the first tool to tackle build automation in scientific computing. Understanding how it relates to existing tools helps clarify its design choices:

5.4.1. vs Spack

Spack is a mature, general-purpose package manager for HPC:

Similarities:

  • Both build from source

  • Both support multiple versions

  • Both handle complex dependency graphs

  • Both generate environment modules

Differences:

Aspect

Spack

HepSW

Scope

General HPC (10,000+ packages)

HEP-focused (~100 packages)

Manifest Language

Python DSL

YAML (declarative)

Learning Curve

Steep (Python API, complex syntax)

Gentle (readable YAML)

Build Variants

Comprehensive but complex

Simple, user-friendly options

Documentation

Package-centric

Build guides + usage tutorials

Target Audience

HPC system administrators

HEP physicists and developers

When to use Spack:

  • You need non-HEP software (compilers, MPI, etc.)

  • You’re a system administrator managing a cluster

  • You need advanced features (compiler bootstrapping, microarchitecture optimization)

When to use HepSW:

  • You’re a HEP physicist setting up your analysis environment

  • You want transparency and simplicity

  • You need HEP-specific optimizations and workflows

  • You want documentation that explains how to use the software, not just how to build it

5.4.2. vs EasyBuild

EasyBuild is another HPC-focused build framework:

Similarities:

  • Source-based builds

  • Reproducibility focus

  • Module file generation

Differences:

Aspect

EasyBuild

HepSW

Configuration

Python-based “easyconfigs”

YAML manifests

Philosophy

System-wide installations

User-space workspaces

Toolchains

Rigid toolchain definitions

Flexible, minimal constraints

Target Environment

Clusters with shared filesystem

Local workstations + clusters

5.4.3. vs Conda/Mamba

Conda is a popular binary package manager:

Fundamental Difference: Conda distributes pre-built binaries; HepSW builds from source.

When Conda Works Well:

  • Python-heavy workflows

  • Standard packages with binaries on conda-forge

  • Quick setup without compilation

When HepSW is Better:

  • Latest versions not yet on conda-forge

  • Custom build configurations needed

  • Source-level debugging required

  • Binary compatibility issues with your system

  • You want to understand how software is built

Complementary Use: Many users combine them:

conda create -n hep python=3.11 cmake numpy  # Basic tools
conda activate hep
hepsw build root geant4  # HEP-specific software from source

5.4.4. vs Nix

Nix provides purely functional package management:

Similarities:

  • Reproducible builds

  • Multiple versions coexist

  • Declarative configuration

Differences:

Aspect

Nix

HepSW

Model

Functional, immutable

Conventional, mutable

Learning Curve

Very steep

Gentle

OS Integration

Deep (can replace OS package manager)

Shallow (workspace-based)

Adoption Barrier

High (requires buying into Nix philosophy)

Low (works like traditional tools)

5.4.5. vs Containers (Docker, Singularity)

Fundamental Difference: Containers package entire environments; HepSW builds software.

When Containers Work Well:

  • Production: reproducible deployment of complete analysis chains

  • Sharing: distribute entire environment to collaborators

  • Isolation: completely separate from host system

When HepSW is Better:

  • Development: iterative builds, testing changes

  • Flexibility: mix and match versions, custom builds

  • Transparency: inspect and modify any component

  • Size: install only what you need (not multi-GB images)

Complementary Use: Build software with HepSW, then package in container for production:

# Development: use HepSW
hepsw build root geant4 my-analysis

# Production: containerize the workspace
FROM ubuntu:22.04
COPY workspace /opt/hep-workspace
RUN echo 'source /opt/hep-workspace/env/my-analysis.sh' >> ~/.bashrc

5.4.6. Summary: HepSW’s Unique Position

HepSW occupies a specific niche:

digraph tool_comparison { rankdir=TB; node [shape=box, style="rounded,filled"]; subgraph cluster_general { label="General-Purpose"; style=filled; fillcolor=lightgray; spack [label="Spack\n(HPC)", fillcolor=lightblue]; easybuild [label="EasyBuild\n(HPC)", fillcolor=lightblue]; nix [label="Nix\n(Universal)", fillcolor=lightblue]; } subgraph cluster_hep { label="HEP-Specific"; style=filled; fillcolor=lightgreen; hepsw [label="HepSW\n(HEP Physics)", fillcolor=yellow]; containers [label="Containers\n(Production)", fillcolor=lightcyan]; } subgraph cluster_binary { label="Binary Distribution"; style=filled; fillcolor=lightyellow; conda [label="Conda\n(Python/Data Science)", fillcolor=lightblue]; cvmfs [label="CVMFS\n(HEP Binary Cache)", fillcolor=lightblue]; } }

HepSW’s sweet spot:

  • HEP physicists who need source builds

  • Local development environments

  • Understanding and transparency

  • Gentle learning curve

  • Project-specific optimization (FCC, DUNE, etc.)

It’s not trying to replace Spack for HPC administrators, nor Conda for Python environments, nor containers for production. It’s designed specifically for HEP developers and users who want control, transparency, and simplicity.

5.5. Developer Architecture

This section is for contributors working on HepSW itself.

5.5.1. Code Organization

hepsw/
├── cmd/hepsw/              # CLI entry point
│   └── main.go            # Cobra command setup
├── internal/              # Internal packages (not importable)
│   ├── cli/              # Command implementations
│   │   ├── build.go
│   │   ├── env.go
│   │   ├── init.go
│   │   └── list.go
│   ├── manifest/         # Manifest parsing and validation
│   │   ├── parser.go
│   │   ├── validator.go
│   │   └── types.go
│   ├── builder/          # Build orchestration
│   │   ├── engine.go
│   │   ├── phases.go
│   │   └── logger.go
│   ├── resolver/         # Dependency resolution
│   │   ├── graph.go
│   │   ├── sat.go
│   │   └── version.go
│   ├── workspace/        # Workspace management
│   │   ├── layout.go
│   │   └── cache.go
│   └── environment/      # Environment script generation
│       ├── generator.go
│       └── templates.go
├── pkg/                  # Public packages (importable)
│   └── types/           # Shared types
└── docs/                # Documentation (Sphinx + Markdown)

5.5.2. Key Abstractions

digraph key_abstractions { rankdir=TB; node [shape=box, style="rounded,filled", fillcolor=lightblue]; manifest [label="Manifest\nYAML definition", fillcolor=lightgreen]; package [label="Package\nParsed manifest + metadata"]; depgraph [label="DependencyGraph\nPackages + edges"]; buildplan [label="BuildPlan\nOrdered build sequence"]; builder [label="Builder\nExecutes build phases"]; workspace [label="Workspace\nManages filesystem layout"]; manifest -> package [label="parse"]; package -> depgraph [label="resolve"]; depgraph -> buildplan [label="sort"]; buildplan -> builder [label="execute"]; builder -> workspace [label="uses"]; }

Manifest → Raw YAML representation Package → Parsed and validated package definition DependencyGraph → DAG of packages with version constraints BuildPlan → Topologically sorted list of builds to execute Builder → Executes build phases for a single package Workspace → Manages directories, caching, paths

5.5.3. Extension Points

HepSW is designed to be extensible:

Custom Source Fetchers Add support for new source types:

type Fetcher interface {
    Fetch(source Source, destDir string) error
    IsCached(source Source) bool
}

Build System Support Add support for non-CMake builds:

type BuildSystem interface {
    Configure(pkg Package, buildDir string) error
    Compile(pkg Package, buildDir string) error
    Install(pkg Package, buildDir, installDir string) error
}

Dependency Resolvers Implement custom resolution strategies:

type Resolver interface {
    Resolve(packages []Package) (BuildPlan, error)
}

5.5.4. Testing Strategy

digraph testing_strategy { rankdir=TB; node [shape=box, style="rounded,filled", fillcolor=lightblue]; unit [label="Unit Tests\nIndividual functions"]; integration [label="Integration Tests\nComponent interactions"]; e2e [label="End-to-End Tests\nComplete workflows"]; unit -> integration [label="builds on"]; integration -> e2e [label="builds on"]; fixtures [label="Test Fixtures\nMinimal CMake projects", shape=cylinder, fillcolor=lightyellow]; e2e -> fixtures [style=dashed]; }

Unit Tests: Test individual functions (manifest parsing, version comparison) Integration Tests: Test component interactions (resolver + builder) End-to-End Tests: Test complete workflows with minimal real packages

Test fixtures include simple CMake projects that mimic HEP software structure but compile in seconds.

5.5.5. Performance Considerations

Parallel Dependency Builds Independent packages can be built in parallel:

hepsw build root geant4 pythia8 -j4  # Build 4 packages concurrently

Implementation uses goroutines with dependency tracking to maximize parallelism.

Incremental Compilation HepSW detects when source hasn’t changed and can reuse build artifacts:

  • Hash source directory

  • Compare with previous build hash

  • Skip configure/compile if unchanged

Caching Strategy

  • Source cache: Persist git clones and downloaded tarballs

  • Build cache: Optionally preserve build directories

  • Binary cache (future): Share built packages across workspaces

5.6. Future Enhancements

5.6.1. Planned Features

Binary Caching Build once, share across machines:

# Machine A
hepsw build root --cache-upload

# Machine B (same OS/arch)
hepsw build root --cache-fetch  # Skip compilation

Cross-Compilation Build for different architectures:

hepsw build root --target aarch64-linux-gnu

Distributed Builds Offload compilation to build servers:

hepsw build root --distributed

Environment Snapshots Capture exact environment for reproducibility:

hepsw env snapshot > my-analysis.lock
# Later, reproduce exactly
hepsw env restore my-analysis.lock

CI/CD Integration GitHub Actions and GitLab CI templates:

- uses: hepsw/setup@v1
  with:
    packages: root geant4

5.6.2. Research Directions

  • Automatic dependency discovery from CMakeLists.txt (deeper Seemake integration)

  • Machine learning for build optimization (predict optimal compiler flags)

  • Provenance tracking (record exact commit hashes, build machine details)

  • Incremental environment updates (change one package without rebuilding everything)

5.7. Contributing to Architecture

The architecture is not set in stone. We welcome discussions about:

  • Alternative dependency resolution strategies

  • Better workspace organization schemes

  • Performance optimizations

  • New use cases that don’t fit the current model

Please open an issue on GitHub to discuss architectural changes before implementing them.


Next: Workspace Layout and Workflow