Training NLU Models on End User Devices with Privacy Protection

Paweł Pardela

supervisor: Artur Janicki



The presentation will show the motivation for choosing improving the client side processing of voice assistant's natural language understanding (NLU) as the topic of my industrial PhD project. Recent world events made it clear that privacy is an important factor for any user facing artificial intelligence system. Users are more aware of their personal data being shared and stored - feeding one centralized model. Current approach is to use federated learning to meet the criteria of device side processing and data privacy protection. Federated learning solves the problem by design. Early research results suggest that federated models can achieve the same accuracy as their deep neural network (DNN) counterparts. Having a federated model implemented means that user data are not being shared anywhere. The initial common model is sent to all devices and then retrained on device with user data. Only the result of that training is being shared with the cloud model. Averaged device models serve as a seed for the next iteration. Weights of a neural network are obscure and make privacy attacks very difficult. However, federated learning increases system complexity and requires additional data transfers. The presentation will show the potential research areas, such as increasing performance of federated learning when applied to training NLU models or mitigating upload speeds bottlenecks.