Researchers at Google Brain have open-sourced the Switch Transformer, a natural-language processing (NLP) AI model. The model scales up to 1.6T parameters and improves training time up to 7x compared to the T5 NLP model, with comparable accuracy.
The team described the model in a paper published on arXiv. The Switch Transformer uses a mixture-of-experts (MoE) paradigm to combine several Transformer attention blocks. Because only a subset of the model is used to process a given input, the number of model parameters can be increased while holding computational cost steady. Compared to Google’s state-of-the-art T5 NLP model, baseline versions of the Switch Transformer can achieve target pre-training perplexity metrics in 1/7 the training time. The 1.6T-parameter version outperforms a T5-XXL on the perplexity metric, with comparable or better performance on downstream NLP tasks, despite training on half the data.
The Transformer architecture has become the primary deep-learning model used for NLP research. Recent efforts have focused on increasing the size of these models, measured in number of parameters, with results that can exceed human performance. A team from OpenAi, creators of the GPT-3 model, found that NLP performance does indeed scale with number of parameters, following a power-law relationship. In developing the Switch Transformer, the Google Brain team sought to maximize parameter count while keeping constant the number of FLOPS per training example and training on “relatively small amounts of data.”
To achieve this, the model uses a mixture of experts (MoE) scheme. MoE was developed in 1991 by a research team that included deep-learning pioneer and Switch Transformer co-creator Geoff Hinton, then at University of Toronto and now at Google Brain. In 2017, Hinton and Google Brain colleagues used MoE to create an NLP model based on a recurrent neural network (RNN) of 137B parameters which achieved state-of-the-art results on language modeling and machine translation benchmarks.
The Switch Transformer uses a modified MoE algorithm called Switch Routing: instead of activating multiple experts and combining their output, Switch Routing chooses a single expert to handle a given input. This simplifies the routing computation, and reduces communication costs since individual expert models are hosted on different GPU devices. One drawback to the scheme, however, is an increased chance of training instability, especially when using reduced-precision arithmetic, due to the “hard” switching decisions. The team mitigated this by reducing the scale factor for initializing the model parameters.
The team used Mesh-TensorFlow (MTF) to train the model, taking advantage of data- and model-parallelism. To investigate the performance of the architecture at different scales, the team trained models of different sizes, from 223M parameters up to 1.6T parameters, finding that the “most efficient dimension for scaling” was the number of experts. Model performance on pre-training and downstream NLP tasks was compared to T5 models requiring similar FLOPs per sample. Baseline-sized Switch Transformer models outperformed T5 on GLUE, SuperGLUE, and SQuAD benchmarks, while achieving a 7x speedup on pre-training time. The large-scale Switch Transformer, with 1.6T parameters and 2048 experts, outperformed a 13B-parameter T5 model in pre-training perplexity, while finishing in 1/4 the time.
In a discussion on Reddit, commenters pointed out that the Google Brain team did not compare their model’s performance to GPT-3, speculating this was due to lack of information in OpenAI’s published result. Another commenter noted:
[T]he time to accuracy gains are remarkable, albeit coming at a cost for hardware requirements. All these are non-issues for Google, but I can see why OpenAI isn’t too keen on these models, at least, so far.
Although Google has not released the pre-trained model weights for the Switch Transformer, the implementation code is available on GitHub.
Instagram Expands Live Video to Meet Covid Demand for Content – BNN
(Bloomberg) — Instagram is expanding its real-time broadcast service to allow creators greater freedom to collaborate on videos.
Facebook Inc., Instagram’s parent, debuted a new feature on Monday called Live Rooms, which will allow as many as four people to broadcast simultaneously. Previously, Instagram users could only stream with one other person, according to the company. The photo-sharing app first tested the new tool in India and Indonesia last fall.
Instagram is hoping creators will take advantage of the new feature to stream podcasts, talk shows, concerts and other content at a time when the pandemic is sending more users to the platform for at-home entertainment.
©2021 Bloomberg L.P.
OnePlus teases March 8th announcement with a single photo – MobileSyrup
OnePlus has started drumming up hype for its next set of devices with an image and the promise of news coming on March 8th.
That iconic photo was taken with a Hasselblad camera, which makes sense since some rumours point to the photography company partnering with OnePlus.
— Pete Lau (@PeteLau) March 1, 2021
We’ve been expecting news from OnePlus in March, so it’s nice to see that’s still happening despite the ongoing COVID-19 pandemic.
Beyond the new camera module, rumours suggest that the company will use a new high-end display tech in the OnePlus 9 Pro.
Instagram Live Rooms allows four users to go live simultaneously – MobileSyrup
Instagram has announced its latest feature, ‘Live Rooms.’
Previously, users could only go live with one other person at a time, but now the social media platform allows twice the number of users at once.
To get this feature to work, swipe left and select the Live camera option. Following that, add a title and then tap the Rooms icon to add guests. You can search for a guest to add or add one of the people who have requested to live with you.
The user who starts the room will be at the top of the screen after adding guests. Broadcasters (the ones who started the room) can add up to three guests at once or one by one. People blocked by any of the active users in the Live Room will not be able to join the Live. Live Rooms offer the ability to report and block comments and apply comment filters.
Instagram says that Live Rooms is launching globally soon.
It seems like Instagram is trying to compete against Clubhouse, a social audio app that allows more than 10 people to go live at once in a single room. Rooms can also have more than 8,000 people in them.
Instagram Live, however, requires its users to go on camera, which sets it apart from Clubhouse and Twitter’s ‘Spaces.’
Ontario government still finalizing AstraZeneca COVID-19 vaccine rollout plan – CTV Toronto
Instagram Expands Live Video to Meet Covid Demand for Content – BNN
B.C. unveils details of mass vaccination plan, approves four-month window between doses – CHEK
Silver investment demand jumped 12% in 2019
Iran anticipates renewed protests amid social media shutdown
Galaxy M31 July 2020 security update brings Glance, a content-driven lockscreen wallpaper service
Health3 hours ago
Ontario reports fewest number of coronavirus-related deaths in a single day since late October – CP24 Toronto's Breaking News
Sports18 hours ago
Auston Matthews, Frederik Andersen return to Toronto Maple Leafs practice on Sunday – TSN
Science15 hours ago
Perseverance Seen From Space by ESA’s ExoMars Orbiter – Universe Today
News15 hours ago
Canada pension fund boss Machin quits after overseas trip for COVID shot
Science13 hours ago
SpaceX aborts launch of Falcon 9 rocket carrying Starlink satellites – Space.com
News23 hours ago
Health Canada received more Johnson & Johnson data on same day as U.S. approval – CBC.ca
Sports21 hours ago
Matthews skates as extra at Leafs' practice – TSN
Tech13 hours ago
‘Genshin Impact’ Reveals Hu Tao Banner Date, Time And Rate-Up 4 Stars – Forbes