Cell Segmentation¶

SMINT provides powerful tools for segmenting cells and nuclei in whole-slide images, with support for distributed processing and multiple GPUs. This comprehensive guide covers all aspects of the segmentation pipeline, from data preparation to post-processing.

Overview of Segmentation Pipeline¶

The SMINT segmentation pipeline consists of several key stages:

Image Preprocessing - Preparing and normalizing input images
Cell Segmentation - Identifying whole cells using Cellpose
Nuclei Segmentation - Identifying cell nuclei (optional)
Post-processing - Refining segmentation results and extracting features
Visualization - Generating visual outputs for quality control

Input Data Requirements¶

Supported Image Formats¶

SMINT supports the following image formats: - OME-TIFF (preferred) - Multi-channel, multi-resolution format with metadata - TIFF - Standard format, single or multi-channel - CZI - Carl Zeiss format, requires additional processing

Required Image Properties¶

For optimal segmentation results, input images should have: - Resolution: Ideally 0.5-1 μm/pixel for cell segmentation, 0.25-0.5 μm/pixel for nuclei - Channels: - At least one membrane/cytoplasm channel (e.g., WGA, phalloidin) - At least one nuclear channel (e.g., DAPI, Hoechst) for nuclei segmentation - Bit depth: 8-bit or 16-bit grayscale per channel - Quality: Minimal noise, good contrast between cells and background

Quick Start¶

Single-Process Segmentation¶

For standard whole-slide images that fit in memory:

python -m scripts.run_segmentation \
    --image path/to/image.ome.tiff \
    --output-dir results/segmentation \
    --cell-channel 0 \
    --nuclei-channel 1 \
    --cell-diameter 60 \
    --nuclei-diameter 30 \
    --visualize

Multi-GPU Distributed Segmentation¶

For very large images requiring multiple GPUs:

python -m scripts.run_distributed_segmentation \
    --image path/to/large_image.ome.tiff \
    --output-dir results/segmentation \
    --cell-channel 0 \
    --nuclei-channel 1 \
    --cell-diameter 60 \
    --nuclei-diameter 30 \
    --chunk-size 2048 2048 \
    --gpus 0 1 2 3 \
    --visualize

Detailed Parameter Guide¶

Common Parameters¶

Parameter	Description	Default	Recommended Range
`--image`	Path to input image	Required	-
`--output-dir`	Directory to save results	Required	-
`--cell-channel`	Channel index for cell segmentation	0	Depends on staining
`--nuclei-channel`	Channel index for nuclei segmentation	1	Depends on staining
`--cell-diameter`	Expected cell diameter in pixels	80	40-120
`--nuclei-diameter`	Expected nuclei diameter in pixels	40	20-60
`--flow-threshold`	Flow threshold for Cellpose	0.4	0.2-0.8
`--cellprob-threshold`	Cell probability threshold	-1.0	-3.0-0.0
`--visualize`	Generate visualizations	False	-
`--chunk-size`	Size of image chunks to process	2048 2048	Depends on GPU memory

Advanced Parameters¶

Parameter	Description	Default	Notes
`--pretrained-model`	Path to custom Cellpose model	None	Use for specialized cell types
`--model-type`	Cellpose model type	"cyto"	Options: "cyto", "nuclei", "cyto2", or custom path
`--min-cell-size`	Minimum cell size in pixels	15	Filters out small objects
`--omit-overlap`	Remove overlapping cell masks	False	Useful for densely packed cells
`--adaptive-threshold`	Use adaptive thresholding	False	Helps with variable image intensity
`--normalize-channels`	Normalize channel intensities	True	Improves segmentation quality
`--save-zarr`	Save results in Zarr format	False	Useful for very large datasets
`--no-nuclei`	Skip nuclei segmentation	False	Speeds up processing

Distributed Processing Parameters¶

Parameter	Description	Default	Notes
`--gpus`	GPU device IDs to use	None	Space-separated list of GPU IDs
`--workers-per-gpu`	Dask workers per GPU	1	Increase for CPU-bound tasks
`--memory-limit`	Memory limit per worker	"10GB"	Adjust based on available RAM
`--scheduler-address`	Dask scheduler address	None	For connecting to existing cluster
`--overlap`	Overlap between chunks	64	Prevents boundary artifacts
`--batch-size`	Number of chunks per batch	4	Adjust based on GPU memory

Segmentation API¶

For programmatic usage within Python scripts:

from smint.segmentation import process_large_image

# Basic usage
results = process_large_image(
    image_path="path/to/image.ome.tiff",
    csv_base_path="results/segmentation",
    chunk_size=(2048, 2048),
    # Cell parameters
    cell_model_path="cyto",
    cells_diameter=80.0,
    cells_flow_threshold=0.4,
    cells_cellprob_threshold=-1.5,
    cells_channels=[0, 0],  # [channel, 0] for grayscale, [channel1, channel2] for RGB
    # Nuclei parameters
    nuclei_model_path="nuclei",
    nuclei_diameter=40.0,
    nuclei_flow_threshold=0.4,
    nuclei_cellprob_threshold=-1.5,
    nuclei_channels=[1, 0],  # [channel, 0] for grayscale
    # Visualization
    visualize=True,
    visualize_output_dir="results/visualization",
    num_visualize_chunks=5,
    visualize_roi_size=(512, 512)
)

# Access the results
cell_outlines = results["cell_outlines"]
nuclei_outlines = results["nuclei_outlines"]

For distributed processing:

from smint.segmentation.distributed_seg import process_large_image_distributed

# Basic distributed usage
results = process_large_image_distributed(
    image_path="path/to/large_image.ome.tiff",
    output_zarr_path="results/segmentation.zarr",
    csv_path="results/segmentation.csv",
    blocksize=(2048, 2048),
    channel=0,  # Main channel for segmentation
    gpus=[0, 1, 2, 3],  # List of GPU IDs
    overlap=64,  # Overlap between chunks
    batch_size=4,  # Number of chunks per batch
    model_type="cyto",  # Cellpose model type
    diameter=80.0,  # Expected cell diameter
    flow_threshold=0.4,
    cellprob_threshold=-1.5
)

Advanced Adaptive Segmentation¶

SMINT supports adaptive segmentation that dynamically adjusts parameters based on local image characteristics:

from smint.segmentation import process_large_image

results = process_large_image(
    # Basic parameters as above, plus:
    enable_adaptive_nuclei=True,
    nuclei_adaptive_flow_min=0.1,
    nuclei_adaptive_flow_step_decrement=0.1,
    nuclei_max_adaptive_attempts=5,
    adaptive_nuclei_trigger_ratio=0.05  # Retry if nuclei count < 5% of cells
)

Output Files and Formats¶

SMINT generates the following output files:

File	Description	Format
`cells_outlines.csv`	Cell outline coordinates	CSV with columns: `cell_id,x,y,chunk_id`
`nuclei_outlines.csv`	Nuclei outline coordinates	CSV with columns: `nuclei_id,x,y,chunk_id`
`cell_features.csv`	Extracted cell features	CSV with measurements for each cell
`segmentation_metadata.json`	Segmentation parameters and stats	JSON
`visualization/*.png`	Visualization images	PNG
`chunks/*.npy`	Raw segmentation masks (if saved)	NumPy arrays
`*.zarr`	Zarr store (for distributed processing)	Zarr directory structure

Live Segmentation Viewer¶

SMINT includes a live segmentation viewer for monitoring the segmentation process in real-time:

from smint.visualization.live_scan_viewer import LiveScanViewer
import tkinter as tk

# Initialize the viewer
root = tk.Tk()
viewer = LiveScanViewer(
    master=root,
    full_scan_path="path/to/image.ome.tiff",
    segmentation_history_dir="results/segmentation",
    tile_info_path="results/tile_info.json",
    update_interval_ms=1000  # Update every 1 second
)

# Start the viewer
viewer.pack(fill=tk.BOTH, expand=True)
root.mainloop()

Common Issues and Troubleshooting¶

Poor Segmentation Quality¶

Problem: Cells or nuclei not properly detected
Solution: Adjust diameter, flow_threshold, and cellprob_threshold. Try using larger diameter for bigger cells, lower flow_threshold for weakly stained cells.

Memory Errors¶

Problem: "CUDA out of memory" or other memory-related errors
Solution: Reduce chunk_size, increase overlap, or use distributed processing with multiple GPUs.

Processing Speed¶

Problem: Segmentation taking too long
Solution: Use multi-GPU processing, reduce visualization, skip nuclei segmentation if not needed.

Boundary Artifacts¶

Problem: Cell masks cut off at chunk boundaries
Solution: Increase overlap between chunks or post-process with stitch_masks=True.

Performance Benchmarks¶

SMINT segmentation performance on different hardware configurations:

Image Size	Hardware	Processing Time	Memory Usage
10k × 10k	Single GPU (RTX 3090)	~5 minutes	~8 GB VRAM
50k × 50k	Single GPU (RTX 3090)	~1 hour	~10 GB VRAM
50k × 50k	4× GPUs (RTX 3090)	~15 minutes	~8 GB VRAM per GPU
100k × 100k	4× GPUs (RTX 3090)	~1 hour	~8 GB VRAM per GPU

Tips for Best Results¶

Image Quality: Start with high-quality, well-stained images for best results
Parameter Tuning: Optimize cell_diameter, flow_threshold, and cellprob_threshold for your specific images
Chunk Size: Balance between processing speed (larger chunks) and memory usage (smaller chunks)
Custom Models: Train custom Cellpose models for specialized cell types
Channel Selection: Choose channels with strongest cell/nuclei signal for segmentation
Validation: Always validate segmentation quality with visualizations
Adaptive Approach: Use adaptive parameters for images with varying intensity or cell density