materialization trade-offs for feature transfer from deep cnns for multimodal data analytics
Name: Supun Chathuranga Nakandala
Grad Year: 2022
Deep convolutional neural networks (CNNs) achieve near-human accuracy on many image understanding tasks. This has led to a growing interest in using deep CNNs to integrate images with structured data for multimodal analytics in many applications to improve prediction accuracy. Since training deep CNNs from scratch is expensive and laborious, transfer learning has become popular: using a pre-trained CNN, one reads off a certain layer of features to represent images and combines them with other features for a downstream ML task. Since no single layer will always offer best accuracy in general, such feature transfer requires comparing many CNN layers. The current dominant approach to this process on top of scalable analytics systems such as TensorFlow and Spark is fraught with inefficiency due to redundant CNN inference and the potential for system crashes due to manual memory management. We present Vista, the first data system to mitigate such issues by elevating the feature transfer workload to a declarative level and formalizing the data model of CNN inference. Vista enables automated optimization of feature materialization trade-offs, memory usage, and system configuration. Experiments with real-world datasets and deep CNNs show that apart from enabling seamless feature transfer, Vista helps avoid system crashes and also reduces runtimes by 67%--90%.
Industry Application Area(s)