MaskLLM introduces learnable semi-structured sparsity to LLMs, enabling efficient pruning while maintaining performance. Using Gumbel Softmax sampling, it learns sparse patterns that help reduce computational overhead during inference, with masks that can transfer across domains for lossless compression. For more details, check out the project page. (View Highlight)
LLaVA-3D extends the LLaVA framework for 3D vision-language tasks by integrating 3D object representations with visual question answering. The model allows users to interact with 3D scenes using natural language, providing capabilities for understanding and answering questions about spatial relations and object properties. LLaVA-3D combines visual cues from both 2D and 3D data to enhance scene comprehension. (View Highlight)
Lotus is a diffusion-based model for dense geometry prediction tasks like depth and normal estimation. It uses a simplified single-step process and directly predicts annotations, making it faster and more accurate. It achieves state-of-the-art performance on zero-shot tasks with minimal training data, showing promise for efficient, high-quality, dense predictions. (View Highlight)
This book explores how generative models like GANs, VAEs, and transformers can improve recommender systems by enhancing accuracy, diversity, and personalization. It introduces a taxonomy of deep generative models—ID-driven, large language, and multimodal models—and highlights their role in generating structured outputs, handling multimedia content, and enabling more dynamic recommendations in domains such as eCommerce and media. (View Highlight)
This CMU course for Fall 2024 offers a comprehensive overview of advanced NLP concepts, including deep learning, machine translation, and language generation. It includes lectures, assignments, and resources designed for students with a foundational understanding of NLP and machine learning. Check out the course page for code and slides and the playlist for the lecture videos. (View Highlight)