Connect with us

Tech

Google Open-Sources Trillion-Parameter AI Language Model Switch Transformer – InfoQ.com

Published

 on


Researchers at Google Brain have open-sourced the Switch Transformer, a natural-language processing (NLP) AI model. The model scales up to 1.6T parameters and improves training time up to 7x compared to the T5 NLP model, with comparable accuracy.

The team described the model in a paper published on arXiv. The Switch Transformer uses a mixture-of-experts (MoE) paradigm to combine several Transformer attention blocks. Because only a subset of the model is used to process a given input, the number of model parameters can be increased while holding computational cost steady. Compared to Google’s state-of-the-art T5 NLP model, baseline versions of the Switch Transformer can achieve target pre-training perplexity metrics in 1/7 the training time. The 1.6T-parameter version outperforms a T5-XXL on the perplexity metric, with comparable or better performance on downstream NLP tasks, despite training on half the data.

The Transformer architecture has become the primary deep-learning model used for NLP research. Recent efforts have focused on increasing the size of these models, measured in number of parameters, with results that can exceed human performance. A team from OpenAi, creators of the GPT-3 model, found that NLP performance does indeed scale with number of parameters, following a power-law relationship. In developing the Switch Transformer, the Google Brain team sought to maximize parameter count while keeping constant the number of FLOPS per training example and training on “relatively small amounts of data.”

To achieve this, the model uses a mixture of experts (MoE) scheme. MoE was developed in 1991 by a research team that included deep-learning pioneer and Switch Transformer co-creator Geoff Hinton, then at University of Toronto and now at Google Brain. In 2017, Hinton and Google Brain colleagues used MoE to create an NLP model based on a recurrent neural network (RNN) of 137B parameters which achieved state-of-the-art results on language modeling and machine translation benchmarks.

The Switch Transformer uses a modified MoE algorithm called Switch Routing: instead of activating multiple experts and combining their output, Switch Routing chooses a single expert to handle a given input. This simplifies the routing computation, and reduces communication costs since individual expert models are hosted on different GPU devices. One drawback to the scheme, however, is an increased chance of training instability, especially when using reduced-precision arithmetic, due to the “hard” switching decisions. The team mitigated this by reducing the scale factor for initializing the model parameters.

The team used Mesh-TensorFlow (MTF) to train the model, taking advantage of data- and model-parallelism. To investigate the performance of the architecture at different scales, the team trained models of different sizes, from 223M parameters up to 1.6T parameters, finding that the “most efficient dimension for scaling” was the number of experts. Model performance on pre-training and downstream NLP tasks was compared to T5 models requiring similar FLOPs per sample. Baseline-sized Switch Transformer models outperformed T5 on GLUE, SuperGLUE, and SQuAD benchmarks, while achieving a 7x speedup on pre-training time. The large-scale Switch Transformer, with 1.6T parameters and 2048 experts, outperformed a 13B-parameter T5 model in pre-training perplexity, while finishing in 1/4 the time.

In a discussion on Reddit, commenters pointed out that the Google Brain team did not compare their model’s performance to GPT-3, speculating this was due to lack of information in OpenAI’s published result. Another commenter noted:

[T]he time to accuracy gains are remarkable, albeit coming at a cost for hardware requirements. All these are non-issues for Google, but I can see why OpenAI isn’t too keen on these models, at least, so far.

Although Google has not released the pre-trained model weights for the Switch Transformer, the implementation code is available on GitHub.
 

Let’s block ads! (Why?)



Source link

Continue Reading

Tech

Instagram Expands Live Video to Meet Covid Demand for Content – BNN

Published

 on


(Bloomberg) — Instagram is expanding its real-time broadcast service to allow creators greater freedom to collaborate on videos.

Facebook Inc., Instagram’s parent, debuted a new feature on Monday called Live Rooms, which will allow as many as four people to broadcast simultaneously. Previously, Instagram users could only stream with one other person, according to the company. The photo-sharing app first tested the new tool in India and Indonesia last fall.

Instagram is hoping creators will take advantage of the new feature to stream podcasts, talk shows, concerts and other content at a time when the pandemic is sending more users to the platform for at-home entertainment.

©2021 Bloomberg L.P.

Let’s block ads! (Why?)



Source link

Continue Reading

Tech

OnePlus teases March 8th announcement with a single photo – MobileSyrup

Published

 on


OnePlus has started drumming up hype for its next set of devices with an image and the promise of news coming on March 8th.

You can find the teaser image on the OnePlus website. The Verge notes that it’s very similar to the iconic photo ‘Earthrise’ from the Apollo 8 mission.

That iconic photo was taken with a Hasselblad camera, which makes sense since some rumours point to the photography company partnering with OnePlus.

We’ve been expecting news from OnePlus in March, so it’s nice to see that’s still happening despite the ongoing COVID-19 pandemic.

Beyond the new camera module, rumours suggest that the company will use a new high-end display tech in the OnePlus 9 Pro. 

Source: OnePlus Via: The Verge

Let’s block ads! (Why?)



Source link

Continue Reading

Tech

Instagram Live Rooms allows four users to go live simultaneously – MobileSyrup

Published

 on


Instagram has announced its latest feature, ‘Live Rooms.’

Previously, users could only go live with one other person at a time, but now the social media platform allows twice the number of users at once.

To get this feature to work, swipe left and select the Live camera option. Following that, add a title and then tap the Rooms icon to add guests. You can search for a guest to add or add one of the people who have requested to live with you.

The user who starts the room will be at the top of the screen after adding guests. Broadcasters (the ones who started the room) can add up to three guests at once or one by one. People blocked by any of the active users in the Live Room will not be able to join the Live. Live Rooms offer the ability to report and block comments and apply comment filters.

Instagram says that Live Rooms is launching globally soon.

It seems like Instagram is trying to compete against Clubhouse, a social audio app that allows more than 10 people to go live at once in a single room. Rooms can also have more than 8,000 people in them.

Instagram Live, however, requires its users to go on camera, which sets it apart from Clubhouse and Twitter’s ‘Spaces.’

Source: Instagram

Let’s block ads! (Why?)



Source link

Continue Reading

Trending