๐ Template Nix Sail: A Complete Reproducible Development Environment
In a previous post we explored how to integrate Sail with Nix using two separate flakes (server and client). Now we take it further: a unified template that simplifies the entire process and adds modern development tools.
The goal is to have an environment where with a single command you have everything ready to work with PySpark or PySail, without manual configurations or global dependencies.
๐ฏ What is template-nix-sail?
Itโs a development template that combines:
- Nix for environment reproducibility
- Sail/PySail as an alternative Spark engine (written in Rust, no Java needed)
- PySpark traditional as an alternative option
- Modern tooling: pytest, ruff, ptpython
Everything configured to work automatically when you enter the directory.
๐ณ Why Nix Instead of Docker?
A common question: why not use Docker for this? Both solve the reproducibility problem, but with different approaches.
| Aspect | Nix | Docker |
|---|---|---|
| Execution | Native on your system | Inside container |
| I/O Performance | No overhead | Bind mounts slow (especially on macOS) |
| Editor/IDE | Works directly | Need devcontainers or extra config |
| Cache | Per individual package | Per image layer |
| Daemon | Not needed | Requires Docker daemon running |
| Shell/aliases | Your normal config | Separate config inside container |
| Disk space | Only whatโs needed | Base images + layers |
| File permissions | No issues | UID/GID can cause problems |
In summary: Nix is lighter and more natural for local development. Your editor, terminal, and tools work without additional configuration. Docker is still excellent for deployment and when you need complete isolation.
๐งฉ Project Structure
template-nix-sail/
โโโ flake.nix # Nix shell definitions
โโโ flake.lock # Dependencies lockfile
โโโ .envrc # direnv configuration
โโโ pyproject.toml # Python project configuration
โโโ .env # Environment variables
โ
โโโ src/ # Source code
โ โโโ main.py # Interactive demo
โ โโโ calculator.py # Example functions
โ โโโ dataframes.py # DataFrame operations
โ
โโโ tests/ # Test suite
โ โโโ conftest.py # pytest fixtures
โ โโโ test_*.py # Tests
โ
โโโ resources/
โโโ ciudades_espana.csv # Sample dataset
๐ง Two Development Shells
The flake.nix defines two environments based on your needs:
| Shell | Command | Java | Use Case |
|---|---|---|---|
| default | nix develop |
Yes (JDK 17) | Traditional PySpark + PySail |
| pysail | nix develop .#pysail |
No | PySail only (lighter) |
When to use each one?
- Default shell: when you need full compatibility with traditional PySpark or want to test both backends.
- Pysail shell: for fast development without Java overhead. Ideal for prototypes and learning.
โก Getting Started
1. Clone the template
git clone https://github.com/davidlghellin/template-nix-sail.git
cd template-nix-sail
2. Activate the environment
nix develop
Or if you use direnv (recommended), the environment activates automatically when entering the directory.
3. Ready!
When activating the shell, Nix:
- Sets up Python 3.12 and JDK 17 (if applicable)
- Creates a
.venv-nixvirtualenv - Installs all dependencies automatically
- Defines useful aliases
Youโll see a sailboat animation (โต) while dependencies are being installed.
๐ ๏ธ Available Aliases
Once inside the environment, you have these shortcuts:
| Alias | Command | Description |
|---|---|---|
t |
pytest -v |
Run tests |
ts |
SPARK_BACKEND=pysail pytest -v |
Tests with PySail |
tp |
SPARK_BACKEND=pyspark pytest -v |
Tests with PySpark |
r |
ruff check . |
Check linting |
rf |
ruff check --fix . && ruff format . |
Fix and format |
๐ฎ Included Demo
The template includes a functional demo with a dataset of 100 Spanish cities:
python src/main.py
The demo shows:
- Reading CSV with Spark
- Top 10 most populated cities
- Population grouped by autonomous community
- Population density calculation
All using PySail by default (no Java needed).
๐ Backend Selection
The SPARK_BACKEND variable controls which engine to use:
# PySail (default) - No Java, faster
SPARK_BACKEND=pysail python src/main.py
# Traditional PySpark - With Java/JVM
SPARK_BACKEND=pyspark python src/main.py
The code automatically detects if a Spark Connect server is available. If not found, it starts an internal one.
๐งช Integrated Testing
The project includes pytest tests that work with both backends:
# Tests with default backend (PySail)
pytest -v
# Unit tests only
pytest -m unit -v
# Tests forcing PySpark
SPARK_BACKEND=pyspark pytest -v
The fixtures in conftest.py automatically handle creation and cleanup of Spark sessions.
๐ Enhanced REPL with ptpython
The template includes ptpython, an enhanced Python REPL with:
- Fuzzy autocompletion (Tab)
- History search (Ctrl+R)
- Syntax highlighting
- Auto-suggestions
ptpython
>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.remote("sc://localhost:50051").getOrCreate()
>>> spark.sql("SELECT 1 + 1").show()
๐ Advantages of This Approach
Compared to the previous server/client separate setup:
- Single project: Everything in one place, easier to maintain
- Two backends: Flexibility to use PySail or PySpark as needed
- Auto-server: No need to manually start a server
- Modern tooling: pytest, ruff, ptpython already configured
- Functional demo: Real example code, not just a โhello worldโ
- Direnv: Automatic environment activation
๐ง Next Steps
This template is a starting point. Some ideas to expand it:
- Add more sample datasets
- Integrate notebooks with Jupyter
- Create more fixtures for testing
- Document common Spark patterns
๐ Resources
๐ญ Final Thoughts
With Nix and this template, setting up a Spark development environment reduces to a single command. No more โworks on my machineโ, no more dependency conflicts.
Itโs the perfect combination for learning, experimenting, and prototyping with Spark in a reproducible way.
Hope you find it useful! If you have ideas or improvements, contributions are welcome. ๐