A machine-learning model must be exposed to thousands, millions, or even billions of examples of photos in order to be trained to efficiently complete a task, such as image classification. Especially when privacy is an issue, as it often is with medical photographs, gathering such massive datasets can be difficult. One common way to deal with this problem is called “federated learning.” Now, researchers at MIT and the company DynamoFL, which was started at MIT, have made it better.
Federated learning is a cooperative technique for teaching a machine-learning model while maintaining the confidentiality of private user information. Thousands or even hundreds of users train their own models on their own devices using their own data. Users then transmit to a central server their models, which the server then integrates to create a better model and feeds back to all users.
For example, a group of institutions spread out around the world could use this method to create a machine-learning model that can find brain cancers in scans while keeping patient information safe on their local servers.
Federated learning, however, has significant disadvantages. Since a large machine-learning model must be transferred back and forth dozens or even hundreds of times, transporting a lot of data between a central server and the model has high communication costs. Additionally, because each user collects their own data, the combined model performs poorly because the data don’t necessarily follow the same statistical trends. Additionally, the combined model was created by averaging the results; it was not customized for each individual.
The researchers devised a method that addresses all three of these federated learning issues at the same time. Their approach reduces the size of the integrated machine-learning model while increasing accuracy and hastening user-to-central server communication. Performance is improved by making sure that each user gets a model that fits their environment better.
When compared to other methods, the researchers were able to drastically reduce the size of the model, which resulted in communication costs for individual users that were four to six times lower. Additionally, their method was able to boost the model’s overall accuracy by roughly 10%.
One issue with federated learning has been addressed in numerous articles, but it was difficult to put it all together. Simply focusing on personalization or communication effectiveness is not a sufficient approach. In order for this method to be applicable in the real world, we had to make certain that we could optimize for everything. According to the primary author of a study that describes this strategy, Vaikkunth Mugunthan, PhD ’22,
Together with senior author Lalana Kagal, a principal research scientist at the Computer Science and Artificial Intelligence Laboratory, Mugunthan co-authored the article (CSAIL). The research will be presented at the European Conference on Computer Vision.
Getting a model smaller
The FedLTN system that the researchers created is based on the lottery ticket hypothesis, a machine learning principle. According to this theory, relatively small subnetworks within very large neural network models can function at the same level as larger subnetworks. It’s like finding the winning lottery ticket when you find one of these subnetworks. (LTN is an abbreviation for “lottery ticket network”).
Machine learning models called neural networks use layers of interconnected nodes, or neurons, to solve problems. They are loosely modeled on the human brain.
A complex network lottery ticket is harder to find than a straightforward scratch-off. The researchers must employ an iterative trimming technique. They prune nodes and connections between them (much like trimming branches off a bush) if the model’s accuracy is higher than a certain threshold, and then test the model after the pruning to check if the accuracy is still higher than the threshold.
This federated learning pruning strategy has been employed by other approaches to reduce the size of machine learning models, allowing for more effective transfer. Though these techniques may speed up processes, model performance declines.
Mugunthan and Kagal used some cutting-edge techniques to speed up the pruning process and make the new, smaller models more accurate and better suited to their users.
By skipping the stage where the leftover pieces of the pruned neural network are “rewound” to their initial values, they were able to expedite the pruning process. Mugunthan says that the model was trained before it was pruned, which makes it more accurate and makes pruning go faster.
They took care not to remove layers in the network that gather crucial statistical data about that user’s particular data while personalizing each model for the user’s environment. Also, each time a model was added, data from the central server was accessed. This saved time and kept people from having to talk to each other over and over again.
For consumers with resource-constrained devices, such as a smart phone on a slow network, they have created a method to lower the number of communication rounds. With a leaner model that has already been refined by a portion of the other users, these users begin the federated learning process.
Using lottery ticket networks to your advantage,
FedLTN demonstrated improved performance and lower communication costs when put to the test in simulations. In one experiment, a model created using a conventional federated learning strategy was 45 megabytes in size. The model created using their technology was only 5 megabytes and had the same accuracy. In another test, FedLTN trained one model using 4,500 megabytes of data, while a state-of-the-art technique needed 12,000 megabytes of communication between users and the server.
The worst-performing clients still had a performance gain of more than 10% with FedLTN. Additionally, Mugunthan notes that the total model accuracy outperformed the most advanced personalization algorithm by about 10%.
Now that FedLTN has been made and made better, Mugunthan is trying to add the method to DynamoFL, a federated learning company that just opened.
He wants to improve this technique going future. For instance, the researchers have shown success using datasets with labels, but utilizing the same techniques on unlabeled data would be more difficult, he says.
Mugunthan hopes that this study will make other academics rethink how they think about federated learning.
“This work emphasizes the value of approaching these issues holistically rather than just focusing on the particular indicators that need to be improved. In some cases, raising one statistic can actually lower the other indicators. Instead, we should concentrate on how we can make a number of things better together, as this is crucial if they are to be used in the current world, “he claims.