The analysis of human activities from multimodal data is useful for surveillance, behavior analysis, human–robot interfaces, and multimedia content analysis. This includes investigating the fundamental tasks of scene analysis such as detection, segmentation and tracking of people, their representation, and the characterization of their condition, as well as the modeling of sequential data and their interpretation in the form of gestures, activities, behavior, or social relationships, through the design of sound algorithms which exploit and extend models and methods of computer vision, machine learning, and multimodal data-fusion. Surveillance, traffic analysis, analysis of behavior, human-robot interfaces, and multimedia content analysis are the main application domains.