More
    HomeMachine learning & AIROBE Array provide small businesses with access to a popular type of...

    ROBE Array provide small businesses with access to a popular type of AI

    Computer scientists at Rice University have developed a groundbreaking low-memory technique that could make deep-learning recommendation models (DLRM), one of the most resource-intensive forms of AI, accessible to even the smallest of businesses. Popular forms of artificial intelligence, DLRM recommendation systems, learn to provide suggestions that people will actually use. But only a few tech giants with unlimited resources have had access to state-of-the-art training models because they need more than a hundred terabytes of memory and processing power on par with supercomputers. The ROBE Array (Rice’s “random offset block embedding array”) aims to alter this. This week, at the Conference on Machine Learning and Systems (MLSys 2022) in Santa Clara, California, an algorithmic way to greatly reduce the size of embedding tables used in DLRM will be shown.

    Shrivastava, Desai, and Chou demonstrated that they could match the training times and double the inference efficiency of state-of-the-art DLRM training methods that require 100 GB of memory and multiple processors at MLSys 2022. Shrivastava is an associate professor of computer science at Rice.

    A new standard has been established for DLRM compression thanks to the ROBE Array, as stated by Shrivastava. And it makes DLRM accessible to non-technical users who may not have the resources to train models that are hundreds of terabytes in size.

    Machine learning algorithms that learn from data are the foundation of DLRM systems. A shopping-oriented recommendation system, for instance, would learn from customers’ actual search terms, product suggestions, and final purchases. Adding more divisions to the training data can help improve the precision of the recommendations. For instance, rather than lumping all shampoos together, a business may decide to divide their offerings up by gender and age group.

    Desai says that the increased categorization has caused the size of the memory structures used for training, which are called embedding tables, to “explode.”

    The memory requirements of DLRM models have grown to the point where “embedding tables account for more than 99.9% of the overall memory footprint,” as Desai put it. “Multiple issues arise as a result of this. For instance, the model must be partitioned and distributed across multiple training nodes and GPUs, making purely parallel training impossible. Additionally, once trained and in production, roughly 80% of the time needed to return a suggestion to a user is spent looking up information in embedded tables. ”

    According to Shrivastava, ROBE Array eliminates the need to store embedding tables by employing a hashing technique for indexing data “a condensed form of the embedding table represented as a single array of learned parameters.”

    Then, “using GPU-friendly universal hashing,” embedding information from the array can be accessed. That’s what he had to say, he clarified.

    In order to determine how quickly a system can train models to a target quality metric, Shrivastava, Desai, and Chou put the ROBE Array through its paces using the popular DLRM MLPerf benchmark. They tested ROBE Array on a number of benchmark data sets and found that, despite a three-orders-of-magnitude reduction in model size, it achieved training accuracy that was the same as or better than previously published DLRM techniques.

    Despite the popularity of deep learning benchmarks, “our results clearly show that most deep-learning benchmarks can be completely overturned by fundamental algorithms,” Shrivastava said. Given the shortage of chips around the world, this is great news for the development of AI in the future.

    Shrivastava’s contribution to MLSys doesn’t begin and end with the ROBE Array. In 2020, his team presented SLIDE at MLSys, a “sub-linear deep learning engine” that could outperform GPU-based trainers while still running on commodity CPUs. In a follow-up presentation at MLSys 2021, they showed how vectorization and memory optimization accelerators could improve SLIDE’s performance even more, letting it train deep neural networks up to 15 times faster than the best GPU systems.

    LEAVE A REPLY

    Please enter your comment!
    Please enter your name here

    Must Read

    spot_img