Uber's Journey to Modernizing Big Data Infrastructure with Google Cloud Platform | Canada News Media
Connect with us

Tech

Uber’s Journey to Modernizing Big Data Infrastructure with Google Cloud Platform

Published

 on

In a recent post on its official engineering blog, Uber, disclosed its strategy to migrate the batch data analytics and machine learning (ML) training stack to Google Cloud Platform (GCP). Uber, runs one of the largest Hadoop installations in the world, managing over an exabyte of data across tens of thousands of servers in each of its two regions. The open-source data ecosystem, particularly Hadoop, has been the cornerstone of the data platform.

The strategic migration plan consists of two steps: Initial migration and leveraging Cloud-Native Services. Uber’s initial strategy involves leveraging GCP’s object store for data lake storage while migrating the rest of their data stack to GCP’s Infrastructure as a Service (IaaS). This approach allows for a swift migration with minimal disruption to the existing jobs and pipelines, as they can replicate the exact versions of their on-premises software stack, engines, and security model on IaaS. Following this phase, the Uber engineering team plans to gradually adopt GCP’s Platform as a Service (PaaS) offerings, such as Dataproc and BigQuery, to harness the elasticity and performance benefits of cloud-native services fully.

Scope of migration (source: Uber’s blog)

Once the initial migration is complete, the team will focus on integrating cloud-native services to maximize the data infrastructure’s performance and scalability. This phased approach ensures that Uber users, from dashboard owners to ML practitioners, experience a seamless transition without altering their existing workflows or services.

To ensure a smooth and efficient migration, the Uber team have established several guiding principles:

  1. Minimize use disruption by moving the majority of the batch data stack onto cloud IaaS as-is; they aim to shield their users from any changes to their artifacts or services. Using well-known abstractions and open standards, they strive to make the migration as transparent as possible.
  2. They will rely on a cloud storage connector that implements the Hadoop FileSystem interface to Google Cloud Storage, ensuring HDFS compatibility. By standardizing their Apache Hadoop HDFS clients, we will abstract the specifics of the on-premise HDFS implementation, allowing seamless integration with GCP’s storage layer.
  3. The Uber team has developed data access proxies for Presto, Spark, and Hive that abstract the underlying physical compute clusters. These proxies will support the selective routing of test traffic to cloud-based clusters during the testing phase and fully route queries and jobs to the cloud stack during the full migration.
  4. Utilizing Uber’s cloud-agnostic infrastructure. Uber existing container environment, computing platform, and deployment tools are built to be agnostic between cloud and on-premises. These platforms will enable to easily expand their batch data ecosystem microservices onto the cloud IaaS.
  5. The team will build and enhance existing data management services to support selected and approved cloud services, ensuring robust data governance. The company aims to maintain the same levels of authorized access and security as on-premises, while supporting seamless user authentication against the object store data lake and other cloud services.

Pre and post-migration Uber’s batch data stack (source: Uber’s blog)

The Uber team focuses on bucket mapping and cloud resource layout for migration. Mapping HDFS files and directories to cloud objects in one or more buckets is critical. They need to apply IAM policies at varying levels of granularity, considering constraints on buckets and objects such as read/write throughput and IOPS throttling. The team aims to develop a mapping algorithm that satisfies these constraints and organizes data resources in an organization-centric hierarchical manner, improving data administration and management.

Security Integration is another workstream; adapting our existing Kerberos-based tokens and Hadoop Delegation tokens for cloud PaaS, particularly Google Cloud Storage (GCS), is essential. This workstream aims to support seamless user, group, and service account authentication and authorization, maintaining consistent access levels as on-premises.

The team also focuses on data replication. HiveSync, the permissions-aware bidirectional data replication service, allows Uber to operate in active-active mode. It extends HiveSync’s capabilities to replicate the on-premise data lake’s data to the cloud-based data lake and corresponding Hive Metastore. This includes an initial bulk migration and ongoing incremental updates until the cloud-based stack becomes the primary.

The last workstream is providing new YARN and Presto clusters on GCP Iaas. Uber data access proxies will route query and job traffic to these cloud-based clusters during the migration, ensuring a smooth transition.

Uber’s big data migration to Google Cloud anticipates challenges like performance differences in storage and unforeseen issues due to its legacy system. The team plans to address these by leveraging open-source tools, utilizing cloud elasticity for cost management, migrating non-core uses to dedicated storage, and proactively testing integrations and deprecating outdated practices.

 

Source link

Continue Reading

Tech

AI could help scale humanitarian responses. But it could also have big downsides

Published

 on

 

NEW YORK (AP) — As the International Rescue Committee copes with dramatic increases in displaced people in recent years, the refugee aid organization has looked for efficiencies wherever it can — including using artificial intelligence.

Since 2015, the IRC has invested in Signpost — a portfolio of mobile apps and social media channels that answer questions in different languages for people in dangerous situations. The Signpost project, which includes many other organizations, has reached 18 million people so far, but IRC wants to significantly increase its reach by using AI tools — if they can do so safely.

Conflict, climate emergencies and economic hardship have driven up demand for humanitarian assistance, with more than 117 million people forcibly displaced in 2024, according to the United Nations refugee agency. The turn to artificial intelligence technologies is in part driven by the massive gap between needs and resources.

To meet its goal of reaching half of displaced people within three years, the IRC is testing a network of AI chatbots to see if they can increase the capacity of their humanitarian officers and the local organizations that directly serve people through Signpost. For now, the pilot project operates in El Salvador, Kenya, Greece and Italy and responds in 11 languages. It draws on a combination of large language models from some of the biggest technology companies, including OpenAI, Anthropic and Google.

The chatbot response system also uses customer service software from Zendesk and receives other support from Google and Cisco Systems.

If they decide the tools work, the IRC wants to extend the technical infrastructure to other nonprofit humanitarian organizations at no cost. They hope to create shared technology resources that less technically focused organizations could use without having to negotiate directly with tech companies or manage the risks of deployment.

“We’re trying to really be clear about where the legitimate concerns are but lean into the optimism of the opportunities and not also allow the populations we serve to be left behind in solutions that have the potential to scale in a way that human to human or other technology can’t,” said Jeannie Annan, International Rescue Committee’s Chief Research and Innovation Officer.

The responses and information that Signpost chatbots deliver are vetted by local organizations to be up to date and sensitive to the precarious circumstances people could be in. An example query that IRC shared is of a woman from El Salvador traveling through Mexico to the United States with her son who is looking for shelter and for services for her child. The bot provides a list of providers in the area where she is.

More complex or sensitive queries are escalated for humans to respond.

The most important potential downside of these tools would be that they don’t work. For example, what if the situation on the ground changes and the chatbot doesn’t know? It could provide information that’s not just wrong, but dangerous.

A second issue is that these tools can amass a valuable honeypot of data about vulnerable people that hostile actors could target. What if a hacker succeeds in accessing data with personal information or if that data is accidentally shared with an oppressive government?

IRC said it’s agreed with the tech providers that none of their AI models will be trained on the data that the IRC, the local organizations or the people they are serving are generating. They’ve also worked to anonymize the data, including removing personal information and location.

As part of the Signpost.AI project, IRC is also testing tools like a digital automated tutor and maps that can integrate many different types of data to help prepare for and respond to crises.

Cathy Petrozzino, who works for the not-for-profit research and development company MITRE, said AI tools do have high potential, but also high risks. To use these tools responsibly, she said, organizations should ask themselves, does the technology work? Is it fair? Are data and privacy protected?

She also emphasized that organizations need to convene a range of people to help govern and design the initiative — not just technical experts, but people with deep knowledge of the context, legal experts, and representatives from the groups that will use the tools.

“There are many good models sitting in the AI graveyard,” she said, “because they weren’t worked out in conjunction and collaboration with the user community.”

For any system that has potentially life-changing impacts, Petrozzino said, groups should bring in outside experts to independently assess their methodologies. Designers of AI tools need to consider the other systems it will interact with, she said, and they need to plan to monitor the model over time.

Consulting with displaced people or others that humanitarian organizations serve may increase the time and effort needed to design these tools, but not having their input raises many safety and ethical problems, said Helen McElhinney, executive director of CDAC Network. It can also unlock local knowledge.

People receiving services from humanitarian organizations should be told if an AI model will analyze any information they hand over, she said, even if the intention is to help the organization respond better. That requires meaningful and informed consent, she said. They should also know if an AI model is making life-changing decisions about resource allocation and where accountability for those decisions lies, she said.

Degan Ali, CEO of Adeso, a nonprofit in Somalia and Kenya, has long been an advocate for changing the power dynamics in international development to give more money and control to local organizations. She asked how IRC and others pursuing these technologies would overcome access issues, pointing to the week-long power outages caused by Hurricane Helene in the U.S. Chatbots won’t help when there’s no device, internet or electricity, she said.

Ali also warned that few local organizations have the capacity to attend big humanitarian conferences where the ethics of AI are debated. Few have staff both senior enough and knowledgeable enough to really engage with these discussions, she said, though they understand the potential power and impact these technologies may have.

“We must be extraordinarily careful not to replicate power imbalances and biases through technology,” Ali said. “The most complex questions are always going to require local, contextual and lived experience to answer in a meaningful way.”

___

The Associated Press and OpenAI have a licensing and technology agreement that allows OpenAI access to part of AP’s text archives.

___

Associated Press coverage of philanthropy and nonprofits receives support through the AP’s collaboration with The Conversation US, with funding from Lilly Endowment Inc. The AP is solely responsible for this content. For all of AP’s philanthropy coverage, visit https://apnews.com/hub/philanthropy.

Source link

Continue Reading

Tech

Ottawa orders TikTok’s Canadian arm to be dissolved

Published

 on

 

The federal government is ordering the dissolution of TikTok’s Canadian business after a national security review of the Chinese company behind the social media platform, but stopped short of ordering people to stay off the app.

Industry Minister François-Philippe Champagne announced the government’s “wind up” demand Wednesday, saying it is meant to address “risks” related to ByteDance Ltd.’s establishment of TikTok Technology Canada Inc.

“The decision was based on the information and evidence collected over the course of the review and on the advice of Canada’s security and intelligence community and other government partners,” he said in a statement.

The announcement added that the government is not blocking Canadians’ access to the TikTok application or their ability to create content.

However, it urged people to “adopt good cybersecurity practices and assess the possible risks of using social media platforms and applications, including how their information is likely to be protected, managed, used and shared by foreign actors, as well as to be aware of which country’s laws apply.”

Champagne’s office did not immediately respond to a request for comment seeking details about what evidence led to the government’s dissolution demand, how long ByteDance has to comply and why the app is not being banned.

A TikTok spokesperson said in a statement that the shutdown of its Canadian offices will mean the loss of hundreds of well-paying local jobs.

“We will challenge this order in court,” the spokesperson said.

“The TikTok platform will remain available for creators to find an audience, explore new interests and for businesses to thrive.”

The federal Liberals ordered a national security review of TikTok in September 2023, but it was not public knowledge until The Canadian Press reported in March that it was investigating the company.

At the time, it said the review was based on the expansion of a business, which it said constituted the establishment of a new Canadian entity. It declined to provide any further details about what expansion it was reviewing.

A government database showed a notification of new business from TikTok in June 2023. It said Network Sense Ventures Ltd. in Toronto and Vancouver would engage in “marketing, advertising, and content/creator development activities in relation to the use of the TikTok app in Canada.”

Even before the review, ByteDance and TikTok were lightning rod for privacy and safety concerns because Chinese national security laws compel organizations in the country to assist with intelligence gathering.

Such concerns led the U.S. House of Representatives to pass a bill in March designed to ban TikTok unless its China-based owner sells its stake in the business.

Champagne’s office has maintained Canada’s review was not related to the U.S. bill, which has yet to pass.

Canada’s review was carried out through the Investment Canada Act, which allows the government to investigate any foreign investment with potential to might harm national security.

While cabinet can make investors sell parts of the business or shares, Champagne has said the act doesn’t allow him to disclose details of the review.

Wednesday’s dissolution order was made in accordance with the act.

The federal government banned TikTok from its mobile devices in February 2023 following the launch of an investigation into the company by federal and provincial privacy commissioners.

— With files from Anja Karadeglija in Ottawa

This report by The Canadian Press was first published Nov. 6, 2024.

The Canadian Press. All rights reserved.

Source link

Continue Reading

Health

Here is how to prepare your online accounts for when you die

Published

 on

 

LONDON (AP) — Most people have accumulated a pile of data — selfies, emails, videos and more — on their social media and digital accounts over their lifetimes. What happens to it when we die?

It’s wise to draft a will spelling out who inherits your physical assets after you’re gone, but don’t forget to take care of your digital estate too. Friends and family might treasure files and posts you’ve left behind, but they could get lost in digital purgatory after you pass away unless you take some simple steps.

Here’s how you can prepare your digital life for your survivors:

Apple

The iPhone maker lets you nominate a “ legacy contact ” who can access your Apple account’s data after you die. The company says it’s a secure way to give trusted people access to photos, files and messages. To set it up you’ll need an Apple device with a fairly recent operating system — iPhones and iPads need iOS or iPadOS 15.2 and MacBooks needs macOS Monterey 12.1.

For iPhones, go to settings, tap Sign-in & Security and then Legacy Contact. You can name one or more people, and they don’t need an Apple ID or device.

You’ll have to share an access key with your contact. It can be a digital version sent electronically, or you can print a copy or save it as a screenshot or PDF.

Take note that there are some types of files you won’t be able to pass on — including digital rights-protected music, movies and passwords stored in Apple’s password manager. Legacy contacts can only access a deceased user’s account for three years before Apple deletes the account.

Google

Google takes a different approach with its Inactive Account Manager, which allows you to share your data with someone if it notices that you’ve stopped using your account.

When setting it up, you need to decide how long Google should wait — from three to 18 months — before considering your account inactive. Once that time is up, Google can notify up to 10 people.

You can write a message informing them you’ve stopped using the account, and, optionally, include a link to download your data. You can choose what types of data they can access — including emails, photos, calendar entries and YouTube videos.

There’s also an option to automatically delete your account after three months of inactivity, so your contacts will have to download any data before that deadline.

Facebook and Instagram

Some social media platforms can preserve accounts for people who have died so that friends and family can honor their memories.

When users of Facebook or Instagram die, parent company Meta says it can memorialize the account if it gets a “valid request” from a friend or family member. Requests can be submitted through an online form.

The social media company strongly recommends Facebook users add a legacy contact to look after their memorial accounts. Legacy contacts can do things like respond to new friend requests and update pinned posts, but they can’t read private messages or remove or alter previous posts. You can only choose one person, who also has to have a Facebook account.

You can also ask Facebook or Instagram to delete a deceased user’s account if you’re a close family member or an executor. You’ll need to send in documents like a death certificate.

TikTok

The video-sharing platform says that if a user has died, people can submit a request to memorialize the account through the settings menu. Go to the Report a Problem section, then Account and profile, then Manage account, where you can report a deceased user.

Once an account has been memorialized, it will be labeled “Remembering.” No one will be able to log into the account, which prevents anyone from editing the profile or using the account to post new content or send messages.

X

It’s not possible to nominate a legacy contact on Elon Musk’s social media site. But family members or an authorized person can submit a request to deactivate a deceased user’s account.

Passwords

Besides the major online services, you’ll probably have dozens if not hundreds of other digital accounts that your survivors might need to access. You could just write all your login credentials down in a notebook and put it somewhere safe. But making a physical copy presents its own vulnerabilities. What if you lose track of it? What if someone finds it?

Instead, consider a password manager that has an emergency access feature. Password managers are digital vaults that you can use to store all your credentials. Some, like Keeper,Bitwarden and NordPass, allow users to nominate one or more trusted contacts who can access their keys in case of an emergency such as a death.

But there are a few catches: Those contacts also need to use the same password manager and you might have to pay for the service.

___

Is there a tech challenge you need help figuring out? Write to us at onetechtip@ap.org with your questions.

Source link

Continue Reading

Trending

Exit mobile version