Machine learning is unlike any other enterprise application, demanding massive datasets from distributed sources. In this episode, Bin Fan of Alluxio discusses the unique challenges of distributed heterogeneous data to support ML workloads with Frederic Van Haren and Stephen Foskett. The systems supporting AI training are unique, with GPUs and other AI accelerators distributed across multiple machines, each accessing the same massive set of small files. Conventional storage solutions are not equipped to serve parallel access to such a large number of small files, and they often become a bottleneck to performance in machine learning training. Another issue is moving data across silos, storage systems and protocols, which is impossible with most solutions.
Three Questions:
Gests and Hosts
Date: 3/15/2022 Tags: @SFoskett, @FredericVHaren, @BinFan, @Alluxio