HandTracker

class HandTracker:
    SUPPORTED_MODELS = {"wilor", "mediapipe", "hamer"}

    def __init__(
        self,
        model: str = "wilor",
        model_path: str | None = None,
        **kwargs,
    )

Unified wrapper over three hand-tracking backends. Constructor delegates to the matching backend with **kwargs forwarded to the backend's Config dataclass. All three return the same HandPose schema.

Constructor

Param	Default	Notes
`model`	`"wilor"`	One of `{"wilor", "mediapipe", "hamer"}`.
`model_path`	`None`	Path to local model repo (WiLoR, HaMeR) or backend dir (EgoBlur). Not needed for MediaPipe.
`**kwargs`	,	Forwarded to the backend's `Config`.

HandTracker(model="mediapipe", max_num_hands=2, min_detection_confidence=0.6)
HandTracker(model="wilor",     model_path="/opt/WiLoR")
HandTracker(model="hamer",     model_path="/opt/hamer", body_detector="regnety")

Methods

load

def load(self) -> None

Force-load the backend (model weights into GPU). Called automatically by the constructor; you only need to call it again if you want to time the load.

detect_hands

def detect_hands(
    self,
    rgb_or_frame,           # SyncedFrame or (H, W, 3) RGB ndarray
    depth=None,             # (H, W) uint16 mm
    intrinsics=None,        # (3, 3) camera matrix
) -> list[HandPose]

Detect 21-joint hands. When rgb_or_frame is a SyncedFrame, the wrapper plumbs frame.rgb, frame.depth, and frame.depth_K (or frame.rgb_K as fallback).

Returns a list of HandPose objects (one per detected hand). Joints are in the camera optical frame, metres when depth was available, else pixels with z=0.

Backend configs

Each backend has a Config dataclass; pass any field as a kwarg.

WiLoRConfig

@dataclass
class WiLoRConfig:
    wilor_dir: Optional[str] = None
    yolo_conf: float = 0.4
    rescale_factor: float = 2.0
    batch_size: int = 16
    detect_every_n: int = 1
    depth_buffer_size: int = 15
    depth_sample_radius: int = 7
    save_mano_vertices: bool = True
    max_joint_abs: float = 3.0
    min_wrist_depth: float = 0.05
    min_palm_span: float = 0.02
    max_palm_span: float = 0.5

MediaPipeConfig

@dataclass
class MediaPipeConfig:
    max_num_hands: int = 2
    min_detection_confidence: float = 0.3
    min_presence_confidence: float = 0.3
    min_tracking_confidence: float = 0.3
    depth_sample_radius: int = 7
    depth_buffer_size: int = 15
    max_joint_abs: float = 3.0
    min_wrist_depth: float = 0.05

HaMeRConfig

@dataclass
class HaMeRConfig:
    hamer_dir: Optional[str] = None
    checkpoint: Optional[str] = None
    body_detector: str = "vitdet"           # or "regnety"
    body_detector_score_thresh: float = 0.5
    min_hand_keypoints: int = 3
    hand_keypoint_score_thresh: float = 0.5
    rescale_factor: float = 2.0
    batch_size: int = 8
    depth_buffer_size: int = 15
    depth_sample_radius: int = 7
    save_mano_vertices: bool = True
    max_joint_abs: float = 3.0
    min_wrist_depth: float = 0.05
    min_palm_span: float = 0.02
    max_palm_span: float = 0.5

MANO extras (WiLoR / HaMeR)

When save_mano_vertices=True (the default for WiLoR and HaMeR), the tracker stashes the following private attributes on each HandPose:

hp._mano_vertices         # (778, 3) float32
hp._mano_global_orient    # (1, 3, 3) float32
hp._mano_hand_pose        # (15, 3, 3) float32
hp._mano_betas            # (10,) float32
hp._pred_cam              # (3,) weak-perspective [s, tx, ty]
hp._pred_cam_t            # (3,)
hp._cam_t                 # (3,)
hp._focal_length          # (2,)
hp._kpts_2d_rgb           # (21, 2) pixel coords in the RGB frame
hp._backend               # "wilor" / "hamer" / "mediapipe"

These are read by session.export and written into annotation.hdf5:/hand-pose. MediaPipe doesn't produce them.

The _mano_* attributes are private (underscore-prefixed) so they don't pollute the public HandPose API; they're only meaningful to the export layer.