Skip to the content.

๐Ÿš€ Template Nix Sail: A Complete Reproducible Development Environment

Nix + Sail Logo

In a previous post we explored how to integrate Sail with Nix using two separate flakes (server and client). Now we take it further: a unified template that simplifies the entire process and adds modern development tools.

The goal is to have an environment where with a single command you have everything ready to work with PySpark or PySail, without manual configurations or global dependencies.


๐ŸŽฏ What is template-nix-sail?

Itโ€™s a development template that combines:

Everything configured to work automatically when you enter the directory.


๐Ÿณ Why Nix Instead of Docker?

A common question: why not use Docker for this? Both solve the reproducibility problem, but with different approaches.

Aspect Nix Docker
Execution Native on your system Inside container
I/O Performance No overhead Bind mounts slow (especially on macOS)
Editor/IDE Works directly Need devcontainers or extra config
Cache Per individual package Per image layer
Daemon Not needed Requires Docker daemon running
Shell/aliases Your normal config Separate config inside container
Disk space Only whatโ€™s needed Base images + layers
File permissions No issues UID/GID can cause problems

In summary: Nix is lighter and more natural for local development. Your editor, terminal, and tools work without additional configuration. Docker is still excellent for deployment and when you need complete isolation.


๐Ÿงฉ Project Structure

template-nix-sail/
โ”œโ”€โ”€ flake.nix           # Nix shell definitions
โ”œโ”€โ”€ flake.lock          # Dependencies lockfile
โ”œโ”€โ”€ .envrc              # direnv configuration
โ”œโ”€โ”€ pyproject.toml      # Python project configuration
โ”œโ”€โ”€ .env                # Environment variables
โ”‚
โ”œโ”€โ”€ src/                # Source code
โ”‚   โ”œโ”€โ”€ main.py         # Interactive demo
โ”‚   โ”œโ”€โ”€ calculator.py   # Example functions
โ”‚   โ””โ”€โ”€ dataframes.py   # DataFrame operations
โ”‚
โ”œโ”€โ”€ tests/              # Test suite
โ”‚   โ”œโ”€โ”€ conftest.py     # pytest fixtures
โ”‚   โ””โ”€โ”€ test_*.py       # Tests
โ”‚
โ””โ”€โ”€ resources/
    โ””โ”€โ”€ ciudades_espana.csv  # Sample dataset

๐Ÿ”ง Two Development Shells

The flake.nix defines two environments based on your needs:

Shell Command Java Use Case
default nix develop Yes (JDK 17) Traditional PySpark + PySail
pysail nix develop .#pysail No PySail only (lighter)

When to use each one?


โšก Getting Started

1. Clone the template

git clone https://github.com/davidlghellin/template-nix-sail.git
cd template-nix-sail

2. Activate the environment

nix develop

Or if you use direnv (recommended), the environment activates automatically when entering the directory.

3. Ready!

When activating the shell, Nix:

  1. Sets up Python 3.12 and JDK 17 (if applicable)
  2. Creates a .venv-nix virtualenv
  3. Installs all dependencies automatically
  4. Defines useful aliases

Youโ€™ll see a sailboat animation (โ›ต) while dependencies are being installed.


๐Ÿ› ๏ธ Available Aliases

Once inside the environment, you have these shortcuts:

Alias Command Description
t pytest -v Run tests
ts SPARK_BACKEND=pysail pytest -v Tests with PySail
tp SPARK_BACKEND=pyspark pytest -v Tests with PySpark
r ruff check . Check linting
rf ruff check --fix . && ruff format . Fix and format

๐ŸŽฎ Included Demo

The template includes a functional demo with a dataset of 100 Spanish cities:

python src/main.py

The demo shows:

All using PySail by default (no Java needed).


๐Ÿ”€ Backend Selection

The SPARK_BACKEND variable controls which engine to use:

# PySail (default) - No Java, faster
SPARK_BACKEND=pysail python src/main.py

# Traditional PySpark - With Java/JVM
SPARK_BACKEND=pyspark python src/main.py

The code automatically detects if a Spark Connect server is available. If not found, it starts an internal one.


๐Ÿงช Integrated Testing

The project includes pytest tests that work with both backends:

# Tests with default backend (PySail)
pytest -v

# Unit tests only
pytest -m unit -v

# Tests forcing PySpark
SPARK_BACKEND=pyspark pytest -v

The fixtures in conftest.py automatically handle creation and cleanup of Spark sessions.


๐Ÿ“ Enhanced REPL with ptpython

The template includes ptpython, an enhanced Python REPL with:

ptpython
>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.remote("sc://localhost:50051").getOrCreate()
>>> spark.sql("SELECT 1 + 1").show()

๐ŸŒŸ Advantages of This Approach

Compared to the previous server/client separate setup:

  1. Single project: Everything in one place, easier to maintain
  2. Two backends: Flexibility to use PySail or PySpark as needed
  3. Auto-server: No need to manually start a server
  4. Modern tooling: pytest, ruff, ptpython already configured
  5. Functional demo: Real example code, not just a โ€œhello worldโ€
  6. Direnv: Automatic environment activation

๐Ÿšง Next Steps

This template is a starting point. Some ideas to expand it:


๐Ÿ“š Resources


๐Ÿ’ญ Final Thoughts

With Nix and this template, setting up a Spark development environment reduces to a single command. No more โ€œworks on my machineโ€, no more dependency conflicts.

Itโ€™s the perfect combination for learning, experimenting, and prototyping with Spark in a reproducible way.

Hope you find it useful! If you have ideas or improvements, contributions are welcome. ๐Ÿš€