Name That Part: 3D Part Segmentation and Naming

By Soumava Paul*December 24, 2025Hacker News: Front Page

Motivation Many vision and graphics applications require 3D parts , not just whole-object labels: robots must grasp handles, and creators need editable, semantically meaningful components. This requires solving two problems at once: segmenting parts and naming them . While part-annotated datasets exist, their label definitions are often inconsistent across sources, limiting robust training and evaluation. Existing approaches typically cover only one side of the problem: segmentation-only models produce unnamed regions, while language-grounded systems often retrieve one part at a time and fail to produce a complete named decomposition. Introduction ALIGN-Parts reframes named 3D part segmentation as a set-to-set alignment problem. Instead of labeling each point independently, we predict a small set of partlets - each partlet represents one part with (i) a soft segmentation mask over points and (ii) a text embedding that can be matched to part descriptions. We then align predicted partlets to candidate descriptions via bipartite matching, enforcing permutation consistency and allowing a null option so the number of parts can adapt per shape. To make partlets both geometrically separable and semantically meaningful, we fuse (1) geometry from a 3D part-field backbone, (2) multi-view appearance features lifted onto 3D, and (3) semantic knowledge from LLM-generated, affordance-aware descriptions (e.g., “the horizontal surface of a chair where a person sits”). Bare part names can be ambiguous across categories (e.g., “legs”). ALIGN-Parts trains with LLM-generated affordance-aware descriptions (embedded with a sentence transformer) to disambiguate part naming during set alignment. ALIGN-Parts. Fuse geometry + appearance, learn part-level partlets, and align them to affordance-aware text embeddings for fast, one-shot segmentation and naming. Training losses Setup & notation. We represent a 3D shape as a point set $\mathcal{P}=\{\mathbf{x}_i\}_{i=1}^N$ (sampled from a mesh/point cloud). The model predicts $K$ Partlets , each with mask logits $\mathbf{m}_k\in\mathbb{R}^{N}$ and a text embedding $\hat{\mathbf{z}}_k\in\mathbb{R}^{d_t}$. Ground-truth provides $A$ part masks $\mathbf{m}^{\mathrm{gt}}_a\in\{0,1\}^{N}$ with text embeddings $\hat{\mathbf{t}}_a\in\mathbb{R}^{d_t}$. A differentiable set matching (Sinkhorn) yields an assignment $\pi(k)\in\{1,\ldots,A\}\cup\{\emptyset\}$; let $\mathcal{M}=\{k:\pi(k)\neq\emptyset\}$ denote matched Partlets. Text alignment (InfoNCE). Makes Partlet embeddings nameable by pulling matched (Partlet, text) pairs together and pushing others apart. $$ L_{\text{text}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} -\log\frac{\exp(\hat{\mathbf{z}}_k\cdot\hat{\mathbf{t}}_{\pi(k)}/\tau)} {\sum_{a=1}^{A}\exp(\hat{\mathbf{z}}_k\cdot\hat{\mathbf{t}}_a/\tau)} $$ Mask supervision (BCE + Dice). Encourages accurate part boundaries and robust overlap with ground-truth parts. $$ L_{\text{mask}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} \Big[\mathrm{BCE}(\mathbf{m}_k,\mathbf{m}^{\mathrm{gt}}_{\pi(k)}) +\big(1-\mathrm{Dice}(\sigma(\mathbf{m}_k),\mathbf{m}^{\mathrm{gt}}_{\pi(k)})\big)\Big] $$ Partness loss. Learns when a Partlet should be “active” vs. “no-part”, enabling variable part counts. $$ L_{\text{part}}=\frac{1}{K}\sum_{k=1}^{K}\mathrm{BCE}(\text{part}_k,\mathbf{1}[\pi(k)\neq\emptyset]) $$ Regularizers. Reduce over/under-segmentation and prevent multiple Partlets from claiming the same points. $$ L_{\text{cov}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} \left|\frac{\sum_i \sigma(m_{ki})-\sum_i m^{\mathrm{gt}}_{\pi(k)i}}{N}\right| \qquad L_{\text{overlap}}=\frac{1}{N}\sum_{i=1}^{N}\Big(\sum_{k=1}^{K}\sigma(m_{ki})-1\Big)^2 $$ Total objective. A weighted sum of the above terms (plus an auxiliary global alignment loss): $$ L_{\text{total}}= \lambda_{\text{mask}}L_{\text{mask}}+ \lambda_{\text{part}}L_{\text{part}}+ \lambda_{\text{text}}L_{\text{text}}+ \lambda_{\text{cov}}L_{\text{cov}}+ \lambda_{\text{ov}}L_{\text{overlap}} $$ Experiments We evaluate ALIGN-Parts on named 3D part segmentation across 3DCoMPaT++ , PartNet , and Find3D , using class-agnostic segmentation (mIoU) and two label-aware metrics - LA-mIoU (strict) and rLA-mIoU (relaxed) - that measure whether predicted parts are named correctly. ALIGN-Parts outperforms strong baselines while avoiding slow, post-hoc clustering, yielding ~100× faster inference. We also align heterogeneous taxonomies via a two-stage pipeline (embedding similarity + LLM validation), enabling unified training on consistent part semantics and supporting...

Preview: ~500 words

Continue reading at Hacker News

Read Full Article

Read on Your E-Reader

Name That Part: 3D Part Segmentation and Naming

More from Hacker News: Front Page