Augmenting Human Creativity with Machine Learning

Hao-Wen Dong

2026 AAAI AAAI 2026

Augmenting Human Creativity with Machine Learning

Abstract

Abstract In this talk, I will survey my work in three main research directions: 1) generative models for music creation, 2) AI-assisted music creation tools, and 3) multimodal generative models for content creation. In particular, I will discuss our recent work on AI-assisted video editing that explores novel machine learning models that can cut, select, and rearrange a long video into a short video. In the first TeaserGen project, we proposed a narration-centered teaser generation system that can effectively compress >30-min documentaries into <3-min teasers leveraging pretrained LLMs and language-vision models. In the second REGen project, we proposed a retrieval-embedded generation framework that allows an LLM to quote multimodal resources while maintaining a coherent narrative. I will conclude by discussing our future work towards next-generation video editing interfaces using multimodal LLMs and retrieval embedded generation. I will also discuss our future work towards playful human-AI music co-creation systems where the user can control a music generation system through hand gestures and body movements.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Hao-Wen Dong

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Deep Learning > Models > Generative Models

Keywords

retrieval augmented generation human-ai collaboration video editing music generation multimodal generative model

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026