I'm very happy to chat about research ideas and collaborate with people. Please feel free to reach out to me if you are interested in discussing or working together!
|
Research
My research focuses on multimodal learning and reasoning, and I'm interested in AI-for-Science.
|
|
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
Yifan Jiang*,
Jiarui Zhang*
Kexuan Sun*,
Zhivar Sourati,
Kian Ahrabian,
Kaixin Ma,
Filip Ilievski,
Jay Pujara,
NeurIPS, 2024
arXiv
A new comprehensive benchmark, MARVEL, that evaluates multi-modal large language models' abstract reasoning abilities, revealing significant performance gaps between human and SOTA MLLMs.
|
|
The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models
Kian Ahrabian*,
Zhivar Sourati*,
Kexuan Sun*,
Jiarui Zhang
Yifan Jiang,
Fred Morstatter ,
Jay Pujara,
COLM, 2024
arXiv
A study of nonverbal reasoning abilities of multi-modal large language models using variations of Raven's Progressive Matrices.
|
|
Guided Profile Generation Improves Personalization with Large Language Models
Jiarui Zhang
EMNLP, 2024
arXiv
We propose guided profile generation to enhance personalization for large language models and evaluate its effectiveness on three popular personalzation tasks.
|
|
Exploring Perceptual Limitation of Multimodal Large Language Models
Jiarui Zhang*,
Jinyi Hu*,
Mahyar Khayatkhoei,
Filip Ilievski,
Maosong Sun
arXiv,
Github
We expose a limitation of several state-of-the-art multimodal LLMs in perceiving small visual objects.
Then we identify four factors that influence this limitation, namely, object quality, size, distractor, and location. Through controlled intervention studies, we reveal the distinct impact caused by each factor. Our findings will potentially offer insights to improve visual processing capabilities of MLLMs.
|
|
Towards Perceiving Small Visual Details with Multimodal LLMs
Jiarui Zhang,
Mahyar Khayatkhoei,
Prateek Chhikara,
Filip Ilievski
NeurIPS R0-FoMo Workshop, 2023
arXiv,
Github
We qualitatively and quantitatively show the limitation of two state-of-the-art multimodal LLMs (MLLMs) in perceiving small visual details for zero-shot visual question answering. Then we found out this is mitigatable by visual cropping following internal attention of MLLM.
|
|
A Study of Situational Reasoning for Traffic Understanding
Jiarui Zhang,
Filip Ilievski,
Kaixin Ma,
Aravinda Kollaa,
Jonathan Francis,
Alessandro Oltramari,
KDD, 2023
arXiv,
Github
We formalize three novel text-based benchmarks on traffic domain, including decision making, real and hypothetical events casual reasoning, and knowledge testing. Then we study the ability of diverse knowledge-enhanced language models on our benckmarks.
|
|
Knowledge-enhanced Agents for Interactive Text Games
Prateek Chhikara,
Jiarui Zhang,
Filip Ilievski,
Jonathan Francis,
Kaixin Ma,
KCAP, 2023
🏆🏆 Best Student Paper Award 🏆🏆
We introduces a knowledge-injection framework to enhance the functional grounding of agents in text-based games, addressing existing limitations in coherence, contextual awareness, and learning. The framework employs strategies like knowledge graphs and input encoding augmentations. Tested on 10 tasks in the ScienceWorld environment, the study reveals how task properties, model architectures, and domain knowledge interact in interactive contexts.
|
|
A Study of Zero-shot Adaptation with Commonsense Knowledge
Jiarui Zhang,
Filip Ilievski,
Kaixin Ma,
Jonathan Francis,
Alessandro Oltramari,
AKBC, 2022
arXiv,
Github
We train different sizes of language models using synthetic data from knowledge graphs. We observe significant zero-shot performance improvement different language tasks. We also study the effect of knowledge graph training data size and find out more data does not always lead to better performance, and the optimal data size grows with the model size.
|
Miscellanea
I enjoy weight lifting in my free time.
I also enjoy cooking recently.
I like eating burgers.
|
This website is adapted from here.
|
|