I was a MSc student in Stanford Vision and Learning Lab, working with Prof. Jiajun Wu and Prof. Fei-Fei Li.
I previously received my dual B.S. in Mechanical Engineering from Shanghai Jiao Tong University and Purdue University, where I was fortunate to be advised by Karthik Ramani on Human-Computer Interaction.
I support diversity, equity, and inclusion. If you would like to have a chat with me regrading research, career plans or anything, feel free to reach out! I would be happy to support people from underrepresented groups in the STEM research community, and hope my expertise can help you.
[Jun 2023] "The Design of a Virtual Prototyping System for Authoring Interactive VR Environments from Real World Scans" is accepted by JCISE.
[Feb 2023] "The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real Objects" is accepted by CVPR'23.
[Jan 2023] "Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear" is accepted by ICRA'23.
[Aug 2022] "See, Hear, Feel: Smart Sensory Fusion for Robotic Manipulation" is accepted by CoRL'22.
[Sep 2021] Started at Stanford as a Msc student in Mechanical Engineering.
Research
I'm broadly interested in artificial intelligence and robotics, including but not limited to perception, planning, control, hardware design, and human-centered AI.
The goal of my research is to build agents that can achieve human-level of learning and adapt to novel and challenging scenarios by leveraging multisensory information including vision, audio, touch, etc.
We introduce the OBJECTFOLDER BENCHMARK, a
benchmark suite of 10 tasks for multisensory object-centric
learning, and the OBJECTFOLDER REAL dataset, in-
cluding the multisensory measurements for 100 real-world
household objects.
We introduce SONICVERSE, a multisensory
simulation platform with integrated audio-visual simulation
for training household agents that can both see and hear.
We demonstrate SONICVERSE’s realism via sim-to-real
transfer.
We build a robot system that can see with a camera,
hear with a contact microphone, and feel with a vision-based tactile sensor,
with all three sensory modalities fused with a self-attention model.
Using our VRFromX system,
users can select region(s) of interest (ROI) in scanned point cloud or
sketch in mid-air using a brush tool to retrieve virtual models and
then attach behavioral properties to them.
Academic Services
Reviewer for CoRL 2023
Teaching
Course Assistant in AA274A: Principle of Robot Autonomy, Stanford University, 2022
Course Assistant in CS231N: Deep Learning for Computer Vision, Stanford University, 2023