High-performance, inexpensive machine learning infrastructure accelerates cloud innovation
Artificial intelligence and machine learning (AI and ML) are key technologies that help organizations create new ways to increase sales, reduce costs, streamline processes. business, and better understand their customers. AWS helps customers accelerate their adoption of AI/ML by delivering powerful computing, high-speed networking, and scalable high-performance storage options when required for any learning project. on the machine. This lowers the entry barrier for organizations seeking cloud adoption to grow their ML applications.
Developers and data scientists are pushing the boundaries of technology and are increasingly adopting deep learning, which is a form of machine learning based on neural network algorithms. These in-depth learning models are larger and more sophisticated resulting in increased costs of running the underlying infrastructure to train and deploy these models.
To facilitate customers in their AI/ML transition, AWS built high performance and short machine learning chips. AWS Inferentia is the first machine learning chip built from scratch in AWS for the lowest cost machine learning inference in the cloud. In fact, Amazon EC2 Inf1 instances run on Inferentia, deliver 2.3x higher performance and up to 70% less cost for machine learning inference than the current generation GPU -based EC2 instances. AWS Trainium is AWS’s second machine learning chip intended for training in-depth learning models and will be available in late 2021.
Customers across the industry are deploying their ML applications in Inferentia production and have seen significant improvements in performance and cost savings. For example, AirBnB’s customer support platform enables intelligent, scalable, and unique community service experiences for its millions of hosts and visitors around the world. It uses Inferentia-based EC2 Inf1 instances to deploy natural language processing (NLP) models that support its chatbots. This brings a 2x increase in performance out of the box in GPU-based times.
With these silicon innovations, AWS allows customers to train and implement their in-depth production learning models quickly with high performance and throughput at lower cost.
Machine learning challenges the rapid transition of cloud-based infrastructure
Machine learning is an iterative process that requires teams to build, train, and deploy applications quickly, as well as train, retrain, and experiment constantly to increase accuracy. prediction of models. When deploying and trained models in their business applications, organizations also need to scale their applications to serve new users around the world. They need to be able to serve multiple requests that arrive at the same time with almost real time latency to ensure a optimal user experience.
Emerging use cases such as object detection, natural language processing (NLP), image classification, conversational AI, and time series data rely on deep learning technology. Deep learning models are increasingly increasing in size and complexity, from having millions of parameters to billions in a matter of a year.
The training and deployment of these complex and sophisticated models translates into significant infrastructure costs. Costs can easily snowball to become even greater as organizations scale their applications to provide near-real-time experiences to their users and customers.
Here cloud-based machine learning infrastructure services can help. The cloud provides on-demand computing access, high-performance networking, and massive data storage, seamlessly combined with ML operations and higher-level AI services, so that customers organizations can start immediately and measure their AI/ML initiatives.
How AWS helps customers accelerate their AI/ML transition
AWS Inferentia and AWS Trainium aim to democratize machine learning and make it accessible to developers regardless of experience and organizational size. The Inferentia design is optimized for high performance, throughput, and low latency, making it ideal for deploying ML inference in scale.
Each AWS Inferentia chip has four NeuronCores that implement a high-performance systolic array matrix multiply engine, which further speeds up common deep learning operations, such as convolution and transformers. NeuronCores is also equipped with a large on-chip cache, which helps cut external memory access, reduce latency, and increase throughput.
AWS Neuron, the software development kit for Inferentia, natively supports leading ML frameworks, such as TensorFlow and PyTorch. Developers can continue to use the same frameworks and lifecycle developments tools they know and love. For most of their trained models, they can compile and deploy them in Inferentia by modifying just one line of code, with no additional changes to the application code.
The result is a high-performance inference deployment, which is easily measurable while controlling cost.
Sprinklr, a software-as-a-service company, has an AI-driven unified customer experience management platform that enables companies to gather and translate real-time customer feedback across multiple channels into actionable insights. This results in proactive issue resolution, improved product development, improved content marketing, and better customer service. Sprinklr used Inferentia to disseminate NLP and some of its computer vision models and saw significant performance improvements.
Many Amazon services are also deploying their Inferentia machine learning models.
Amazon Prime Video uses computer vision ML models to analyze the video quality of live events to ensure the best viewing experience for Prime Video members. It deployed image classification ML models on EC2 Inf1 instances and saw a 4x improvement in performance and up to 40% cost savings compared to GPU-based instances.
Another example is Amazon Alexa’s AI and ML-based intelligence, powered by Amazon Web Services, which is available on more than 100 million devices today. Alexa’s promise to customers is that it will always be smarter, more communicative, more active, and more enjoyable. Delivering on that promise requires continuous improvement in response times and machine learning infrastructure costs. By deploying Alexa’s text-to-speech ML models on Inf1 instances, it has been able to reduce inference latency by 25% and cost-per-inference by 30% to improve the service experience for of the ten million customers who use Alexa every month.
Release of new cloud engine learning capabilities
While companies are racing to future proof of their business by enabling the most advanced digital products and services, no organization has failed to deploy sophisticated machine learning models to help transform. or their customer experiences. Over the past few years, there has been a huge increase in the use of machine learning for a variety of use cases, from personalization and churn prediction to fraud detection and supply chain prediction. .
Fortunately, the cloud machine learning infrastructure releases new capabilities that were previously impossible, making them much more accessible to non-expert practitioners. That’s why AWS customers are already using Inferentia-powered Amazon EC2 Inf1 instances to provide the intelligence behind their recommendation engines and chatbots and to capture action insights from customer feedback.
With AWS cloud-based machine learning infrastructure options suitable for different skill levels, it is clear that any organization can accelerate innovation and embrace the entire machine learning lifecycle on a scale. As machine learning continues to become more widespread, organizations are now able to fundamentally change the customer experience — and the way they do business — with a cost-effective, high-performance cloud-based machine learning infrastructure.
Learn more about how the AWS machine learning platform can help your company transform HERE.
This feature was created by AWS. It was not written by the editorial staff of the MIT Technology Review.