adplus-dvertising
Connect with us

Tech

Comparative Analysis of Major Distributed File System Architectures: GFS vs. Tectonic vs. JuiceFS

Published

 on

Key Takeaways

  • Designing distributed file systems that maintain POSIX-compatibility is a challenging task, often requiring tradeoffs to be made.
  • GFS introduced a decoupled architecture comprising a master, chunkservers, and clients and became the foundation of many other big data systems.
  • Tectonic employs a layered metadata design, enabling the separation of storage and compute for metadata. This innovative approach enhances scalability and performance.
  • JuiceFS uses cost-effective, robust object storage services for data storage, while employing open-source databases as its metadata engine. This aligns with the demands of cloud computing.
  • Distributed file systems play a crucial role in enabling scalable, reliable, and performant data storage and processing, driving innovation in the field of big data and cloud-native solutions.

As technology advances and data continues to explode, traditional disk file systems have revealed their limitations. To address the growing storage demands, distributed file systems have emerged as dynamic and scalable solutions.

In this article, we explore the design principles, innovations, and challenges addressed by three representative distributed file systems: Google File System (GFS), Tectonic, and JuiceFS.

  • GFS pioneered commodity hardware use and influenced systems like Hadoop Distributed File System (HDFS) in big data.
  • Tectonic introduced layered metadata and storage/compute separation, improving scalability and performance.
  • JuiceFS, designed for the cloud-native era, uses object storage and a versatile metadata engine for scalable file storage in the cloud.

By exploring the architectures of these three systems, you will gain valuable insights into designing distributed file systems.

This understanding can guide enterprises in choosing suitable file systems.

We aim to inspire professionals and researchers in big data, distributed system design, and cloud-native technologies with knowledge to optimize data storage, stay informed about industry trends, and explore practical applications.

An overview of popular distributed file systems

The table below shows a variety of widely-used distributed file systems, both open-source and proprietary.

[Click on the image to view full-size]

As shown in the table, a large number of distributed systems emerged around the year 2000. Before this period, shared storage, parallel file systems, and distributed file systems existed, but they often relied on specialized and expensive hardware.

The “POSIX-compatible” column in the table represents the compatibility of the distributed file system with the Portable Operating System Interface (POSIX), a set of standards for operating system implementations, including file system-related standards. A POSIX-compatible file system must meet all the features defined in the standard, rather than just a few.

For example, GFS is not a POSIX-compatible file system. Google made several trade-offs when it designed GFS. It discarded many disk file system features and retained some distributed storage requirements needed for Google’s search engine at that time.

In the following sections, we’ll focus on the architecture design of GFS, Tectonic, and JuiceFS. Let’s explore the contributions of each system and how they have transformed the way we handle data.

GFS Architecture

In 2003, Google published the GFS paper. It demonstrated that we can use cost-effective commodity computers to build a powerful, scalable, and reliable distributed storage system, entirely based on software, without relying on proprietary or expensive hardware resources.

GFS significantly reduced the barrier to entry for distributed file systems. Its influence can be seen in varying degrees on many subsequent systems. HDFS, an open-source distributed file system developed by Yahoo, is heavily influenced by the design principles and ideas presented in the GFS paper. It has become one of the most popular storage systems in the big data domain. Although GFS was released in 2003, its design is still relevant and widely used today.

The following figure shows the GFS architecture:

[Click on the image to view full-size]

(Source: The Google File System)

A GFS cluster consists of:

  • A master, which serves as the metadata node. To maintain metadata such as directories, permissions, and attributes for a file system, a central node, the master, is used. The master is structured in a tree-like design.
  • Multiple chunkservers, which store the data. The chunkserver relies on the local operating system’s file system to store the data.
  • Multiple clients

The communication between the master and chunkserver is through a network, resulting in a distributed file system. The chunkservers can be horizontally scaled as data grows.

All components are interconnected in GFS. When a client initiates a request, it first retrieves the file metadata information from the master, communicates with the chunkserver, and finally obtains the data.

GFS stores files in fixed-size chunks, usually 64 MB, with multiple replicas to ensure data reliability. Therefore, reading the same file may require communication with different chunkservers. The replica mechanism is a classic design of distributed file systems, and many open-source distributed system implementations today are influenced by GFS.

While GFS was groundbreaking in its own right, it had limitations in terms of scalability. To address these issues, Google developed Colossus as an improved version of GFS. Colossus provides storage for various Google products and serves as the underlying storage platform for Google Cloud services, making it publicly available. With enhanced scalability and availability, Colossus is designed to handle modern applications’ rapidly growing data demands.

Tectonic Architecture

Tectonic is the largest distributed file system used at Meta (formerly Facebook). This project, originally called Warm Storage, began in 2014, but its complete architecture was not publicly released until 2021.

Prior to developing Tectonic, Meta primarily used HDFS, Haystack, and f4 for data storage:

  • HDFS was used in the data warehousing scenario (limited by the storage capacity of a single cluster, with dozens of clusters deployed).
  • Haystack and f4 were used for unstructured data storage scenarios.

Tectonic was designed to support these three storage scenarios in a single cluster.

The figure below shows the Tectonic architecture:

[Click on the image to view full-size]

(Source: Facebook’s Tectonic Filesystem: Efficiency from Exascale)

Tectonic consists of three components:

  • The Client Library
  • The Metadata Store
  • The Chunk Store

Layer design in Tectonic – Innovations in Tectonic architecture design

Innovation #1: Layered metadata

Tectonic abstracts the metadata of the distributed file system into a simple key-value (KV) model. This allows for excellent horizontal scaling and load balancing, and effectively prevents hotspots in data access.

Tectonic introduces a hierarchical approach to metadata, setting it apart from traditional distributed file systems. The Metadata Store is divided into three layers, which correspond to the data structures in the underlying KV storage:

  • The Name layer, which stores the metadata related to the file name or directory structure, sharded by directory IDs
  • The File layer, which stores the file attributes, sharded by file IDs
  • The Block layer, which stores the metadata regarding the location of data blocks in the Chunk Store, sharded by block IDs

The figure below summarizes the key-value mapping of the three layers:

[Click on the image to view full-size]

(Source: Facebook’s Tectonic Filesystem: Efficiency from Exascale)

This layered design addresses the scalability and performance demands of Tectonic, especially in Meta’s scenarios, where handling exabyte-scale data is required.

Innovation #2: Separation of storage and compute for metadata

The three metadata layers are stateless and can be horizontally scaled based on workloads. They communicate with the Key-Value Store, a stateful storage in the Metadata Store, through the network.

The Key-Value Store is not solely developed by the Tectonic team; instead, they use ZippyDB, a distributed KV storage system within Meta. ZippyDB is built on RocksDB and the Paxos consensus algorithm. Tectonic relies on ZippyDB’s KV storage and its transactions to ensure the consistency and atomicity of the file system’s metadata.

Transactional functionality plays a vital role in implementing a large-scale distributed file system. It’s essential to horizontally scale the Metadata Store to meet the demands of such a system. However, horizontal scaling introduces the challenge of data sharding. Maintaining strong consistency is a critical requirement in file system design, especially when performing operations like renaming directories with multiple subdirectories. Ensuring efficiency and consistency throughout the renaming process is a significant and widely recognized challenge in distributed file system design.

To address this challenge, Tectonic uses ZippyDB’s transactional features. When handling metadata operations within a single shard, Tectonic guarantees both transactional behavior and strong consistency.

However, ZippyDB does not support cross-shard transactions. This limits Tectonic’s ability to ensure atomicity when it processes metadata requests that span multiple directories, such as moving files between directories.

Innovation #3: Erasure coding in the Chunk Store

As previously mentioned, GFS ensures data reliability and security through multiple replicas, but this approach comes with high storage costs. For example, storing just 1 TB of data typically requires three replicas, resulting in at least 3 TB of storage space. This cost increases significantly for large-scale systems like Meta, operating at the exabyte level.

To solve this problem, Meta implements erasure coding (EC) in the Chunk Store which achieves data reliability and security with reduced redundancy, typically around 1.2 to 1.5 times the original data size. This approach offers substantial cost savings compared to the traditional three-replica method. Tectonic’s EC design provides flexibility, allowing configuration on a per-chunk basis.

While EC effectively ensures data reliability with minimal storage space, it does have some drawbacks. Specifically, reconstructing lost or corrupted data incurs high computational and I/O resource requirements.

According to the Tectonic research paper, the largest Tectonic cluster in Meta comprises approximately 4,000 storage nodes, with a total capacity of about 1,590 petabytes, which is equivalent to 10 billion files. This scale is substantial for a distributed file system and generally fulfills the requirements of the majority of use cases at present.

JuiceFS Architecture

JuiceFS was born in 2017, a time when significant changes had occurred in the external landscape compared to the emergence of GFS and Tectonic:

  • There had been remarkable advancements in hardware resources. To put it into perspective, the network bandwidth in Google data centers back then was merely 100 Mbps. Today, on AWS, machine network bandwidth can reach up to 100 Gbps, a thousandfold increase.
  • Cloud computing had become mainstream, with enterprises transitioning into the “cloud era” through public, private, or hybrid clouds.

This shift presented new challenges for infrastructure architecture. Migrating traditional infrastructure designed for IDC environments to the cloud often brought about various issues. Maximizing the benefits of cloud computing became a crucial requirement for seamless integration of infrastructure into cloud environments.

Moreover, GFS and Tectonic were in-house systems serving specific company operations, operating at a large scale but with a narrow focus. In contrast, JuiceFS is designed to cater to a wide range of public-facing users and to meet diverse use case requirements. As a result, the architecture of JuiceFS differs significantly from the other two file systems.

Taking these changes and distinctions into account, let’s look at the JuiceFS architecture as shown in the figure below:

[Click on the image to view full-size]

(Source: JuiceFS Architecture)

JuiceFS consists of three components:

  • The Metadata Engine
  • The Data Storage
  • The Client

While JuiceFS shares a similar overall framework with the aforementioned systems, it distinguishes itself through various design aspects.

Data Storage

Unlike GFS and Tectonic, which rely on proprietary data storage, JuiceFS follows the trend of the cloud-native era by using object storage. As previously mentioned, Meta’s Tectonic cluster uses over 4,000 servers to handle exabyte-scale data. This inevitably leads to significant operational costs for managing such a large-scale storage cluster.

For regular users, object storage has several advantages:

  • Out-of-the-box usability
  • Elastic capacity
  • Simplified operations and maintenance
  • Support for erasure coding, resulting in lower storage costs compared to replication

However, object storage has limitations, including:

  • Object immutability
  • Poor metadata performance
  • Absence of strong consistency
  • Limited random read performance

To tackle these challenges, JuiceFS adopts the following strategies in its architectural design:

  • An independent metadata engine
  • A three-layer data architecture comprising chunks, slices, and blocks
  • Multi-level caching

Metadata Engine

JuiceFS supports various open-source databases as its underlying storage for metadata. This is similar to Tectonic, but JuiceFS goes a step further by supporting not only distributed KV stores but also Redis, relational databases, and other storage engines. This design has these advantages:

  • It allows users to choose the most suitable solution for their specific use cases, aligning with JuiceFS’s goal of being a versatile file system.
  • Open-source databases often offer fully managed services in public clouds, resulting in almost zero operational costs for users.

Tectonic achieves strong metadata consistency by using ZippyDB, a transactional KV store. However, its transactionality is limited to metadata operations within a single shard. In contrast, JuiceFS has stricter requirements for transactionality and demands global strong consistency across shards. Therefore, all supported databases integrated as metadata engines must support transactions. With a horizontally scalable metadata engine like TiKV, JuiceFS can now store over 20 billion files in a single file system, meeting the storage needs of enterprises with massive data. This capability makes JuiceFS an ideal choice for enterprises dealing with massive data storage needs.

Client

The main differences between the JuiceFS client and the clients of the other two systems are as follows:

  • The GFS client speaks non-standard protocol and does not support the POSIX standard. It only allows append-only writes. This limits its usability to a specific scenario.
  • The Tectonic client also lacks support for POSIX and only permits append-only writes, but it employs a rich client design that incorporates many functionalities on the client side for maximum flexibility.
  • The JuiceFS client supports multiple standard access methods, including POSIX, HDFS, S3, WebDAV, and Kubernetes CSI.
  • The JuiceFS client also offers caching acceleration capabilities, which are highly valuable for storage separation scenarios in cloud-native architectures.

Conclusion

Distributed file systems have transformed data storage, and three notable systems stand out in this domain: GFS, Tectonic, and JuiceFS.

  • GFS demonstrated the potential of cost-effective commodity computers in building reliable distributed storage systems. It paved the way for subsequent systems and played a significant role in shaping the field.
  • Tectonic introduced innovative design principles such as layered metadata and separation of storage and compute. These advancements addressed scalability and performance challenges, providing efficiency, load balancing, and strong consistency in metadata operations.
  • JuiceFS, designed for the cloud-native era, uses object storage and a versatile metadata engine to deliver scalable file storage solutions. With support for various open-source databases and standard access methods, JuiceFS caters to a wide range of use cases and seamlessly integrates with cloud environments.

Distributed file systems overcome traditional disk limitations, providing flexibility, reliability, and efficiency for managing large data volumes. As technology advances and data grows exponentially, their ongoing evolution reflects industry’s commitment to efficient data management. With diverse architectures and innovative features, distributed file systems drive innovation across industries.

 

728x90x4

Source link

Continue Reading

Health

Here is how to prepare your online accounts for when you die

Published

 on

 

LONDON (AP) — Most people have accumulated a pile of data — selfies, emails, videos and more — on their social media and digital accounts over their lifetimes. What happens to it when we die?

It’s wise to draft a will spelling out who inherits your physical assets after you’re gone, but don’t forget to take care of your digital estate too. Friends and family might treasure files and posts you’ve left behind, but they could get lost in digital purgatory after you pass away unless you take some simple steps.

Here’s how you can prepare your digital life for your survivors:

Apple

The iPhone maker lets you nominate a “ legacy contact ” who can access your Apple account’s data after you die. The company says it’s a secure way to give trusted people access to photos, files and messages. To set it up you’ll need an Apple device with a fairly recent operating system — iPhones and iPads need iOS or iPadOS 15.2 and MacBooks needs macOS Monterey 12.1.

For iPhones, go to settings, tap Sign-in & Security and then Legacy Contact. You can name one or more people, and they don’t need an Apple ID or device.

You’ll have to share an access key with your contact. It can be a digital version sent electronically, or you can print a copy or save it as a screenshot or PDF.

Take note that there are some types of files you won’t be able to pass on — including digital rights-protected music, movies and passwords stored in Apple’s password manager. Legacy contacts can only access a deceased user’s account for three years before Apple deletes the account.

Google

Google takes a different approach with its Inactive Account Manager, which allows you to share your data with someone if it notices that you’ve stopped using your account.

When setting it up, you need to decide how long Google should wait — from three to 18 months — before considering your account inactive. Once that time is up, Google can notify up to 10 people.

You can write a message informing them you’ve stopped using the account, and, optionally, include a link to download your data. You can choose what types of data they can access — including emails, photos, calendar entries and YouTube videos.

There’s also an option to automatically delete your account after three months of inactivity, so your contacts will have to download any data before that deadline.

Facebook and Instagram

Some social media platforms can preserve accounts for people who have died so that friends and family can honor their memories.

When users of Facebook or Instagram die, parent company Meta says it can memorialize the account if it gets a “valid request” from a friend or family member. Requests can be submitted through an online form.

The social media company strongly recommends Facebook users add a legacy contact to look after their memorial accounts. Legacy contacts can do things like respond to new friend requests and update pinned posts, but they can’t read private messages or remove or alter previous posts. You can only choose one person, who also has to have a Facebook account.

You can also ask Facebook or Instagram to delete a deceased user’s account if you’re a close family member or an executor. You’ll need to send in documents like a death certificate.

TikTok

The video-sharing platform says that if a user has died, people can submit a request to memorialize the account through the settings menu. Go to the Report a Problem section, then Account and profile, then Manage account, where you can report a deceased user.

Once an account has been memorialized, it will be labeled “Remembering.” No one will be able to log into the account, which prevents anyone from editing the profile or using the account to post new content or send messages.

X

It’s not possible to nominate a legacy contact on Elon Musk’s social media site. But family members or an authorized person can submit a request to deactivate a deceased user’s account.

Passwords

Besides the major online services, you’ll probably have dozens if not hundreds of other digital accounts that your survivors might need to access. You could just write all your login credentials down in a notebook and put it somewhere safe. But making a physical copy presents its own vulnerabilities. What if you lose track of it? What if someone finds it?

Instead, consider a password manager that has an emergency access feature. Password managers are digital vaults that you can use to store all your credentials. Some, like Keeper,Bitwarden and NordPass, allow users to nominate one or more trusted contacts who can access their keys in case of an emergency such as a death.

But there are a few catches: Those contacts also need to use the same password manager and you might have to pay for the service.

___

Is there a tech challenge you need help figuring out? Write to us at onetechtip@ap.org with your questions.

Source link

Continue Reading

Tech

Google’s partnership with AI startup Anthropic faces a UK competition investigation

Published

 on

 

LONDON (AP) — Britain’s competition watchdog said Thursday it’s opening a formal investigation into Google’s partnership with artificial intelligence startup Anthropic.

The Competition and Markets Authority said it has “sufficient information” to launch an initial probe after it sought input earlier this year on whether the deal would stifle competition.

The CMA has until Dec. 19 to decide whether to approve the deal or escalate its investigation.

“Google is committed to building the most open and innovative AI ecosystem in the world,” the company said. “Anthropic is free to use multiple cloud providers and does, and we don’t demand exclusive tech rights.”

San Francisco-based Anthropic was founded in 2021 by siblings Dario and Daniela Amodei, who previously worked at ChatGPT maker OpenAI. The company has focused on increasing the safety and reliability of AI models. Google reportedly agreed last year to make a multibillion-dollar investment in Anthropic, which has a popular chatbot named Claude.

Anthropic said it’s cooperating with the regulator and will provide “the complete picture about Google’s investment and our commercial collaboration.”

“We are an independent company and none of our strategic partnerships or investor relationships diminish the independence of our corporate governance or our freedom to partner with others,” it said in a statement.

The U.K. regulator has been scrutinizing a raft of AI deals as investment money floods into the industry to capitalize on the artificial intelligence boom. Last month it cleared Anthropic’s $4 billion deal with Amazon and it has also signed off on Microsoft’s deals with two other AI startups, Inflection and Mistral.

The Canadian Press. All rights reserved.

Source link

Continue Reading

News

Kuwait bans ‘Call of Duty: Black Ops 6’ video game, likely over it featuring Saddam Hussein in 1990s

Published

 on

 

DUBAI, United Arab Emirates (AP) — The tiny Mideast nation of Kuwait has banned the release of the video game “Call of Duty: Black Ops 6,” which features the late Iraqi dictator Saddam Hussein and is set in part in the 1990s Gulf War.

Kuwait has not publicly acknowledged banning the game, which is a tentpole product for the Microsoft-owned developer Activision and is set to be released on Friday worldwide. However, it comes as Kuwait still wrestles with the aftermath of the invasion and as video game makers more broadly deal with addressing historical and cultural issues in their work.

The video game, a first-person shooter, follows CIA operators fighting at times in the United States and also in the Middle East. Game-play trailers for the game show burning oilfields, a painful reminder for Kuwaitis who saw Iraqis set fire to the fields, causing vast ecological and economic damage. Iraqi troops damaged or set fire to over 700 wells.

There also are images of Saddam and Iraq’s old three-star flag in the footage released by developers ahead of the game’s launch. The game’s multiplayer section, a popular feature of the series, includes what appears to be a desert shootout in Kuwait called Scud after the Soviet missiles Saddam fired in the war. Another is called Babylon, after the ancient city in Iraq.

Activision acknowledged in a statement that the game “has not been approved for release in Kuwait,” but did not elaborate.

“All pre-orders in Kuwait will be cancelled and refunded to the original point of purchase,” the company said. “We remain hopeful that local authorities will reconsider, and allow players in Kuwait to enjoy this all-new experience in the Black Ops series.”

Kuwait’s Media Ministry did not respond to requests for comment from The Associated Press over the decision.

“Call of Duty,” which first began in 2003 as a first-person shooter set in World War II, has expanded into an empire worth billions of dollars now owned by Microsoft. But it also has been controversial as its gameplay entered the realm of geopolitics. China and Russia both banned chapters in the franchise. In 2009, an entry in the gaming franchise allowed players to take part in a militant attack at a Russian airport, killing civilians.

But there have been other games recently that won praise for their handling of the Mideast. Ubisoft’s “Assassin’s Creed: Mirage” published last year won praise for its portrayal of Baghdad during the Islamic Golden Age in the 9th century.

The Canadian Press. All rights reserved.

Source link

Continue Reading

Trending