M.S. Thesis Spring 2026 Seoul National University

Real-time multimodal 3D reconstruction with tactile-enhanced Gaussian splatting.

GaussianFeels is an online visuo-tactile reconstruction and tracking system built around an explicit object-centric 3D Gaussian map — updated under hand-induced occlusion, tracked when pose supervision is removed, and exported to manipulation from frame zero.

Read the method Conference poster Thesis PDF
01 / Problem

Why one camera isn't enough

The geometry that matters during in-hand manipulation is the geometry the camera can't see.

During grasping and reorientation, the surfaces a robot needs to reason about are exactly the ones occluded by the hand and fingertips. Tactile sensing measures those regions — but only over a small contact patch. A practical manipulation system needs both.

Modality A · global
RGB-Dcamera

Dense observations of exposed surfaces. Fails behind the hand and the fingertips during contact.

Modality B · local
DIGITtactile

Direct contact geometry at the manipulation interface. Tiny footprint — only a few square centimeters.

Joint state
Handproprio.

Forward-kinematic grasp center, finger poses, contact constraints. Used to seed and bound estimates.

Research claim

An explicit object-centric Gaussian map can serve as the shared state for contact-rich manipulation: updated online from synchronized RGB-D, tactile, and proprioceptive observations — tracked directly when object-pose supervision is removed — and exposed immediately to downstream manipulation modules as a progressively improving object model.

Three problems, one representation

The thesis argues that prior systems pick a representation that solves one of these and forces a re-encode for the other two. Implicit fields render and supervise but don't export cleanly to a policy. Mesh-only pipelines export but can't be updated under contact. Point clouds update easily but render poorly. An explicit Gaussian state covers all three roles — and lets pose tracking sit in the middle of the loop.

Online state

Sparse Gaussian updates with contact-aware population management and an active-budget cap so the runtime stays online.

Tracking reference

The frozen-map signed-distance field exposes a differentiable residual the Theseus optimizer can converge on under heavy occlusion.

Manipulation export

PLY exports and provenance-labeled point clouds hand off to a policy — measured surfaces flagged separately from generated ones.

02 / Method

Object-centric Gaussian map · pose tracker · occlusion-aware loss

Four parts, all reconstructing the same object frame.

Sensor ingestion synchronizes RGB-D, segmentation, tactile contacts, and hand state. The Gaussian map lives in object coordinates and is synchronized to world only when rendering or supervision needs it. The pose tracker sits in the middle of the loop — it's the transform that links observations to the map.

SENSOR INGESTION RGB-D + mask It · Dt · Mt · K · TWC Tactile (DIGIT) Pttac · Nttac (heightmap) Hand state ht · grasp center FK GT pose (optional) TWO,GT · map/slam only SYNCHRONIZED Ft eq. (4.1) POSE TRACKER Frame-0: coord-descent + tactile ICP gate: fitness, RMSE, Δ Frozen-map SDF residual Theseus SE(3) two-stage → TtWO OBJECT-CENTRIC MAP μiO · qiO · siO · αi · ci contact-aware spawn + boost freeze inactive · budget cap tactile-region density boost eqs. (4.9–4.12) occlusion-aware reweighting · eq. (4.13–4.17) frozen anchors qi, ni
Figure 4.1 (this site). Per-frame data flow. Pose tracking links the synchronized observations Ft to the Gaussian state in object coordinates; supervision is computed in world frame and reweighted by the occlusion ratio.

2.1 · Frozen-map signed distance

Tracking samples object pixels from the current depth image, transforms those points into the world frame, and minimizes a signed-distance residual against a frozen anchor cloud (qi, ni) sampled from recent keyframes. Weights fall off Gaussian-like with distance; the result is a smooth analogue of a point-cloud SDF.

wi(p) = exp(− ‖p − qi‖² / 2σ²)(4.4)
q̃(p) = Σᵢ wi qi / Σᵢ wi(4.5)
ñ(p) = Σᵢ wi ni / ‖·‖(4.6)
dM(p) = (p − q̃(p))ᵀ ñ(p)(4.7)

2.2 · Pose objective

For frames after the seed, the runtime samples object pixels, lifts them, and optimizes only the latest pose in a sliding window. The objective combines a frozen-map residual on camera and tactile points with temporal and ICP priors:

minTtWO Σp ∈ Pcam dM(TtOW p)² + λtac Σp ∈ Ptac dM(TtOW p)²
    + λtr ‖tt − tt-1‖² + λrot ‖log(Rt-1ᵀ Rt)‖²
    + λicp,t ‖tt − t̂icpt‖² + λicp,r ‖log(R̂icpᵀ Rt)‖²(4.8)

2.3 · Occlusion-aware reweighting

RGB and depth residuals become misleading when the hand fills a large fraction of the object view. The trainer monitors an occlusion ratio ρocc from a dilated foreground-and-edge mask and reweights the four core supervision channels online:

svis = 1 − ρocc(1 − wmin)(4.14)
λ′rgb = svis λrgb ,  λ′depth = svis λdepth(4.15–16)
λ′tactile = (1 + β ρocc) λtactile(4.17)

With wmin = 0.2 and β = 2, tactile supervision overtakes the unoccluded visual baseline at ρocc ≈ 0.4 — the point at which the hand starts dominating the camera view.

2.4 · Manipulation-side completion

A second process bootstraps a frame-0 estimate from a single RGBA crop using Hunyuan3D-2-mini, an orientation-variant search, and a registration stage that solves the model-to-object transform. As the SLAM loop accumulates measurements, generated geometry is progressively replaced by measured geometry — with provenance preserved so the policy knows which surfaces are real.

03 / Pipeline

Per-frame runtime

Two processes, one shared object frame.

SLAM runs on one GPU, manipulation-side completion on another. The frame-0 payload bootstraps a Hunyuan3D-2-mini prior; later frames progressively replace generated geometry with measured geometry as the episode unfolds.

t=0 t=1 t=k t=k+m t=N SLAM PROCESS · GPU 0 Frame 0 coord-descent search tactile-aware seed spawn map Per-frame SLAM loop depth ICP gate → Theseus pose opt → map sync → loss step contact spawn · density boost · freeze inactive occlusion-aware reweight: λ′rgb, λ′depth, λ′tactile Export PLY · world poses eval depth + F-score runtime metrics MANIPULATION PROCESS · GPU 1 Bootstrap (t=0) RGBA crop + Hunyuan3D-2-mini orient. variant search Progressive replacement measured geometry overwrites generated, by region gap-filler debug cloud (internal seam stabilizer) provenance labels preserved per point Policy export point cloud + {measured, generated} flags F0 payload measured frames final map back POSE MODE map slam true_slam — same SLAM loop, different pose source

Figure. Two-process runtime. Solid arrows are within-process; dashed arrows are inter-process queue messages. The pose mode strip across the bottom describes which channel feeds TtWO — the rest of the SLAM loop is identical across modes.

04 / Experiments

Live reconstructions · animated

What the system actually does, frame by frame.

Each tile below is an interactive simulation of a different stage of the runtime — drag the timeline to scrub. Real video captures from FeelSight-Sim, FeelSight-Real, and FeelSight-Occlusion will replace these tiles after the final experiment sweep (see status section).

Online Gaussian accumulation · FeelSight-Sim

240 frames · sparse spawn + tactile boost

Pose tracking under occlusion · FeelSight-Occlusion

SLAM mode · ρocc oscillating · tactile-dominant

Progressive replacement · manipulation branch

orange = generated prior · blue = measured

Frame-0 orientation search

staged coord-descent · 15°→8°→3°→1°

Note. These visualizations recreate the qualitative behavior described in Chapters 4–5 of the thesis. Final videos — rendered from saved PLY exports of the 40-trial benchmark — will be linked here once the run completes.

05 / Benchmarks

Reconstruction quality · pose tracking · shape priors

How the system holds up under the saved 40-trial sweep.

Three benchmark groups: (a) reconstruction quality on FeelSight-{Sim, Real, Occlusion}, (b) pose-tracking stability under three pose modes, and (c) frame-0 image-to-3D priors evaluated by aligned F-score at 5 mm.

5.1 · Reconstruction quality (FeelSight family)

F-5 mm ↑ · ADD-S ↓ · runtime FPS ↑ · fraction of trials reaching ≥ 50% measured surface ↑

VariantPose modeF@5 mm ↑ADD-S (mm) ↓FPS ↑Measured ≥50% ↑
FeelSight-Simmap— pending —
FeelSight-Simtrue_slam
FeelSight-Realslam
FeelSight-Realtrue_slam
FeelSight-Occlusionslam
FeelSight-Occlusiontrue_slam

Table 6.1. Reserved table for the final SLAM sweep. Numbers will be filled in from the 40-trial Optuna baseline (see Appendix A.1 of the thesis).

5.2 · Frame-0 shape-prior comparison 40 trials · F@5 mm

Below is the partial benchmark already tabulated in the thesis manuscript. Hunyuan3D-2-mini is the deployed prior for the manipulation branch; the others are baselines.

MethodF@5 mm (mean)F@5 mm (median)Wins on cuboidWins on articulatedUsed in pipeline
Hunyuan3D-2-mini deployed0.6890.72024/4031/40✓ frame-0 prior
InstantMesh0.6390.65129/408/40baseline
Fast-SAM3D0.6330.648baseline
TripoSR0.5760.590— (high F, blob)baseline
RGB2Point0.4840.495baseline
Geco0.4000.411— (collapses)baseline

Table 6.3. The benchmark cannot be interpreted from F-score alone — TripoSR can outscore Hunyuan3D-2-mini on a structured cuboid trial while still rounding the object into a generic blob. Qualitative win-rate on category-faithful shapes is reported alongside F-score.

5.3 · Effect of frame-0 prior on downstream manipulation

The manipulation branch hands a provenance-labeled object cloud to a downstream policy from t = 0. The benchmark evaluates whether the early prior helps or hurts policy success when measured geometry is still sparse.

Success @ t = 0.25 N
— %

frame-0 prior on vs. off, FeelSight-Real

Success @ t = N
— %

once measured fraction ≥ 80%

Recovery time
frames

time to first stable grasp

Policy disagreement

between generated and measured surfaces

Table 6.5 (reserved). Cards are placeholders — they will be replaced by the final manipulation-prior ablation once policy rollouts are available.

06 / Failure cases

Where the system breaks — and why

Honest failures from pilot runs.

Failure cases matter as much as benchmarks. The thesis catalogues three recurring modes; reproducing the qualitative behavior here lets future work target the right thing.

F1 · Thin / wire-like objects

map population

Anisotropic Gaussians can't shrink below the contact-spawn scale floor; wire-like geometry gets covered by splats biased outside the GT line. Tactile contacts help locally but can't carry the rest of the object.

F2 · ICP gate rejects large motion

tracking

When inter-frame translation exceeds 50 mm, the ICP prior is rejected and the tracker falls back to the temporal prior alone. On rapid hand re-grasps this leaves the optimizer with too small a basin and pose lags by 1–2 frames.

F3 · Mask drift around DIGIT glow

segmentation

MobileSAM occasionally swaps the object mask for the bright DIGIT sensor glow. Median-area selection and the 60 px negative-prompt filter mitigate but don't fully solve it; outliers from drift events propagate into spawn decisions.

What the thesis says about these

Section 7.2 (Limitations) lists each of these together with the implementation hyperparameter that controls it — spawn-scale floor for F1, the four ICP gates for F2, and the MobileSAM area-selection rule for F3 — and Section 7.3 (Future Work) maps each to a concrete next step.

07 / Status

Where the thesis is right now

Manuscript locked, experiments running, defense scheduled.

The thesis is in its final experimental sweep. The companion site updates as results land. Downloadable PDF will appear here after the defense.

Timeline

Aug 2025

Proposal accepted

Visuo-tactile reconstruction with explicit Gaussian map — problem formulation locked.

Dec 2025

SLAM runtime feature-complete

Theseus pose optimizer, frozen-map SDF, occlusion-aware reweighting, contact-aware spawn.

Feb 2026

Manipulation branch landed

Hunyuan3D-2-mini frame-0 prior, orientation-variant search, progressive replacement.

Apr → May 2026

Final experiment sweep

40-trial Optuna baseline running on FeelSight-{Sim, Real, Occlusion}; benchmarks below populate as runs finish.

Jun 2026

Defense

Committee review and oral defense at SNU Mechanical Engineering.

Jul 2026

Final PDF + deposit

Bound deposit, repository freeze, downloadable thesis link goes live on this page.

Build progress

Manuscript94%
SLAM runtime100%
Manipulation branch100%
Experiment sweep61%
Figures + tables (final)42%
Companion site85%

Downloads

M.S. Thesis
thesis.pdf
available after defense — Spring 2026
— Pending —
Conference paper
gaussianfeels.pdf
available after submission
— Pending —
Code & checkpoints
github.com/KrishiAttriSNU/GaussianFeels
production codebase — SLAM + manipulation branch
View code