
Davinack Research Group @ Wheaton College


hapnet: installation guide and use
What Hapnet does
Hapnet builds a population-aware haplotype network from an aligned FASTA file. It produces:
-
a publication-ready network figure (PNG/PDF/SVG).
-
TSV logs including haplotype definitions, sequence-to-haplotype membership, shared haplotypes, and a summary.
Requirements
-
Python 3.9+
-
An aligned FASTA file (all sequences same length)
​
Quick start (most users)
-
Open Terminal (macOS/Linux) or PowerShell (Windows)
-
Create a fresh environment and install:
MacOS/Linux
pip install hapnet
Windows (Powershell)
pip install hapnet
-
Run Hapnet on your aligned FASTA:
hapnet your_alignment.fasta --out network.png --log-prefix run1
Outputs will be written to your current folder (or wherever you specify).
​
Input FASTA format (important)
1) Sequences must be aligned.
All sequences must be the same length (e.g., output from MAFFT, MUSCLE, Clustal, etc.)
2) Population must be the last underscore-delimited token
Hapnet parses population identity from the final underscore-delimited token in each FASTA header.
Examples:
>Ind1_Pop1
>Ind2_Pop2
>Ind3_Pop2
​
More complex headers are fine as long as population is last:
​
>Ind7_Site1_2019_Pop3
>MN605578_Pneocaeca_rI
​
In these examples, the populations are interpreted as Pop1, Pop2, Pop3, RI.
​
Command-line options
​
hapnet input.fasta --out network.png --log-prefix run1
​
--out can be network.png, network.pdf, or network.svg
--log-prefix sets the prefix for output TSV files (e.g., run1_*)
​
Output files explained
​
If you run:
​
hapnet input.fasta --out network.png --log-prefix run1
​
You will get:
​
1. network.png (the haplotype. network figure (nodes sized by frequency; shared haplotypes shown as population pie charts; mutation tick marks on edges)
​
2. run1_haplotypes.tsv (one row per haplotype - haplotype ID, sequence, total count, etc.)
​
3. run1_membership.tsv (maps each input sequence header to its haplotype ID)
​
4. run1_summary.tsv (summary statistics e.g. total haplotypes, number private/shared, etc.)
​
​
​
Recommended Workflow for Real Data Sets
​
1. Align sequences (MAFFT/MUSCLE/etc.).
2. Ensure headers end with _PopulationLabel.
3. Run hapnet.
4. Use the TSV logs for downstream analysis, QC, and manuscript tables.
​
​
​
Troubleshooting
​
"command not found: hapnet"
​
You likely forgot to activate the environment or you installed into a different environment than the one you're using
​
"Sequences are not the same length"
​
Your FASTA is not aligned or contains sequences with gaps/missing ends: realign and/or trim to equal length.
​
"My populations are wrong"
​
hapnet uses the final underscore token. If your header ends in _2019 or _COI then hapnet will treat that as population. FIx by renaming headers so population is last.
​
"Plot is crowded/nodes overlap"
​
Dense networks can become hard to visualize when there are many haplotypes. The TSV logs remain correct even if the figure is crowded; consider plotting subsets or summarizing.
​
​
Citation
​
If you do plan to use this in your analysis and to publish it, please cite as follows:
​
Davinack, A.A. (2026). hapnet: Population-aware haplotype networks in Python (v0.1.0). https://pypi.org/project/hapnet
​
​
​