Distributed data store

Quality 0.50 · 2 views · Updated 2 months ago

Computer network with multiple nodes to store information

title: "Distributed data store" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["data-management", "distributed-data-storage", "distributed-data-stores"] description: "Computer network with multiple nodes to store information" topic_path: "general/data-management" source: "https://en.wikipedia.org/wiki/Distributed_data_store" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

::summary Computer network with multiple nodes to store information ::

A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion.{{Citation |author = Yaniv Pessach |title = Distributed Storage |edition = Distributed Storage: Concepts, Algorithms, and Implementations |ol = 25423189M

Distributed databases

Distributed databases are usually non-relational databases that enable a quick access to data over a large number of nodes. Some distributed databases expose rich query abilities while others are limited to a key-value store semantics. Examples of limited distributed databases are Google's Bigtable, which is much more than a distributed file system or a peer-to-peer network,{{cite web | access-date = 2011-04-05 | publisher = Paper Trail | title = Bigtable: Google's Distributed Data Store | quote = Although GFS provides Google with reliable, scalable distributed file storage, it does not provide any facility for structuring the data contained in the files beyond a hierarchical directory structure and meaningful file names. It’s well known that more expressive solutions are required for large data sets. Google’s terabytes upon terabytes of data that they retrieve from web crawlers, amongst many other sources, need organising, so that client applications can quickly perform lookups and updates at a finer granularity than the file level. [...] The very first thing you need to know about Bigtable is that it isn’t a relational database. This should come as no surprise: one persistent theme through all of these large scale distributed data store papers is that RDBMSs are hard to do with good performance. There is no hard, fixed schema in a Bigtable, no referential integrity between tables (so no foreign keys) and therefore little support for optimised joins. | url = http://the-paper-trail.org/blog/?p=86 | archive-url = https://web.archive.org/web/20170716092550/http://the-paper-trail.org/blog/bigtable-googles-distributed-data-store | archive-date = 2017-07-16 | url-status = dead | access-date = 2011-04-05 | author = Sarah Pidcock | date = 2011-01-31 | page = 2/22 | publisher = WATERLOO – CHERITON SCHOOL OF COMPUTER SCIENCE | title = Dynamo: Amazon's Highly Available Key-value Store | quote = Dynamo: a highly available and scalable distributed data store | url = http://www.cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/sarah1.pdf}} and Microsoft Azure Storage.

As the ability of arbitrary querying is not as important as the availability, designers of distributed data stores have increased the latter at an expense of consistency. But the high-speed read/write access results in reduced consistency, as it is not possible to guarantee both consistency and availability on a partitioned network, as stated by the CAP theorem.

Peer network node data stores

In peer network data stores, the user can usually reciprocate and allow other users to use their computer as a storage node as well. Information may or may not be accessible to other users depending on the design of the network.

Most peer-to-peer networks do not have distributed data stores in that the user's data is only available when their node is on the network. However, this distinction is somewhat blurred in a system such as BitTorrent, where it is possible for the originating node to go offline but the content to continue to be served. Still, this is only the case for individual files requested by the redistributors, as contrasted with networks such as Hyphanet, Winny, Share and Perfect Dark where any node may be storing any part of the files on the network.

Distributed data stores typically use an error detection and correction technique. Some distributed data stores (such as Parchive over NNTP) use forward error correction techniques to recover the original file when parts of that file are damaged or unavailable. Others try again to download that file from a different mirror.

Examples

Distributed non-relational databases

::data[format=table]

Product	License	High availability	Notes
Apache Accumulo
Aerospike
Apache Cassandra			formerly used by Facebook
Apache Ignite
Bigtable			used by Google
Couchbase			used by LinkedIn, PayPal, and eBay
CrateDB
Apache Druid			used by Netflix, and Yahoo
Dynamo			used by Amazon
etcd
Hazelcast
HBase			formerly used by Facebook
Hypertable			Baidu
MongoDB
MySQL NDB Cluster			SQL and NoSQL APIs
Riak
Redis
ScyllaDB
Voldemort			used by LinkedIn
::

Peer network node data stores

BitTorrent
Blockchain (database)
Chord project
Freenet
GNUnet
IPFS
Mnet
Napster
NNTP (the distributed data storage protocol used for Usenet news)
Unity, of the software Perfect Dark
Share
Siacoin
DeNet
Storage@home
Tahoe-LAFS
Winny
ZeroNet

References

ja:分散ファイルシステム#分散データストア

References

(2011-09-16). "Windows Azure Storage".

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::