HandTracker
Unified hand tracking wrapper over WiLoR, MediaPipe, and HaMeR backends.
class HandTracker:
SUPPORTED_MODELS = {"wilor", "mediapipe", "hamer"}
def __init__(
self,
model: str = "wilor",
model_path: str | None = None,
**kwargs,
)Unified wrapper over three hand-tracking backends. Constructor delegates to the matching backend with **kwargs forwarded to the backend's Config dataclass. All three return the same HandPose schema.
Constructor
| Param | Default | Notes |
|---|---|---|
model | "wilor" | One of {"wilor", "mediapipe", "hamer"}. |
model_path | None | Path to local model repo (WiLoR, HaMeR) or backend dir (EgoBlur). Not needed for MediaPipe. |
**kwargs | , | Forwarded to the backend's Config. |
HandTracker(model="mediapipe", max_num_hands=2, min_detection_confidence=0.6)
HandTracker(model="wilor", model_path="/opt/WiLoR")
HandTracker(model="hamer", model_path="/opt/hamer", body_detector="regnety")Methods
load
def load(self) -> NoneForce-load the backend (model weights into GPU). Called automatically by the constructor; you only need to call it again if you want to time the load.
detect_hands
def detect_hands(
self,
rgb_or_frame, # SyncedFrame or (H, W, 3) RGB ndarray
depth=None, # (H, W) uint16 mm
intrinsics=None, # (3, 3) camera matrix
) -> list[HandPose]Detect 21-joint hands. When rgb_or_frame is a SyncedFrame, the wrapper plumbs frame.rgb, frame.depth, and frame.depth_K (or frame.rgb_K as fallback).
Returns a list of HandPose objects (one per detected hand). Joints are in the camera optical frame, metres when depth was available, else pixels with z=0.
Backend configs
Each backend has a Config dataclass; pass any field as a kwarg.
WiLoRConfig
@dataclass
class WiLoRConfig:
wilor_dir: Optional[str] = None
yolo_conf: float = 0.4
rescale_factor: float = 2.0
batch_size: int = 16
detect_every_n: int = 1
depth_buffer_size: int = 15
depth_sample_radius: int = 7
save_mano_vertices: bool = True
max_joint_abs: float = 3.0
min_wrist_depth: float = 0.05
min_palm_span: float = 0.02
max_palm_span: float = 0.5MediaPipeConfig
@dataclass
class MediaPipeConfig:
max_num_hands: int = 2
min_detection_confidence: float = 0.3
min_presence_confidence: float = 0.3
min_tracking_confidence: float = 0.3
depth_sample_radius: int = 7
depth_buffer_size: int = 15
max_joint_abs: float = 3.0
min_wrist_depth: float = 0.05HaMeRConfig
@dataclass
class HaMeRConfig:
hamer_dir: Optional[str] = None
checkpoint: Optional[str] = None
body_detector: str = "vitdet" # or "regnety"
body_detector_score_thresh: float = 0.5
min_hand_keypoints: int = 3
hand_keypoint_score_thresh: float = 0.5
rescale_factor: float = 2.0
batch_size: int = 8
depth_buffer_size: int = 15
depth_sample_radius: int = 7
save_mano_vertices: bool = True
max_joint_abs: float = 3.0
min_wrist_depth: float = 0.05
min_palm_span: float = 0.02
max_palm_span: float = 0.5MANO extras (WiLoR / HaMeR)
When save_mano_vertices=True (the default for WiLoR and HaMeR), the tracker stashes the following private attributes on each HandPose:
hp._mano_vertices # (778, 3) float32
hp._mano_global_orient # (1, 3, 3) float32
hp._mano_hand_pose # (15, 3, 3) float32
hp._mano_betas # (10,) float32
hp._pred_cam # (3,) weak-perspective [s, tx, ty]
hp._pred_cam_t # (3,)
hp._cam_t # (3,)
hp._focal_length # (2,)
hp._kpts_2d_rgb # (21, 2) pixel coords in the RGB frame
hp._backend # "wilor" / "hamer" / "mediapipe"These are read by session.export and written into annotation.hdf5:/hand-pose. MediaPipe doesn't produce them.
The _mano_* attributes are private (underscore-prefixed) so they don't
pollute the public HandPose API; they're only meaningful to the export
layer.
See also
- Hand tracking guide, backend trade-offs.
HandPose, output schema.- HDF5 schema → /hand-pose, what gets exported.