What’s Sharding and Why is it Important in the Blockchain World?

Ishaal Ali
CryptoStars
Published in
9 min readJan 1, 2022

--

You must be hiding under a rock on Mars if you haven’t heard the words Cryptocurrency, Bitcoin or Blockchain by now. Stories of people retiring on the money they made owning and selling bitcoins are inspirational for some, and envious for most. But what are some of the technological pieces that make cryptocurrencies work? What are some of the strategies used to make the massive number of transactions manageable? There are many pieces that make the cryptocurrency world work, but one that makes the large number of transactions possible is sharding, which is the focus of this article.

Article Outline:

  • Overview of Blockchain
  • Sharding- How it Works
  • Safety Concerns with Sharding
  • Why is Sharding Beneficial?
  • Common Strategies used when Sharding

For anyone getting started in blockchain, this may not be the best article for you, as it focuses more on a sharding as a strategy rather than blockchain itself.

With that being said, let’s get into it!

What is Blockchain?

Because one of the major uses of sharding is in blockchain, I think it’s beneficial to give an overview of what blockchain is. It’s essentially a shared ledger, an online database. What makes this database unique is the use of three concepts: Blocks, Hash, and Distribution, which I’ll explain further below.

Blocks

A blockchain is a growing list of records, called blocks, which are linked together. Each block has a hash, and also has the hash of the previous block. All the blocks are stored in a distributed way that makes it very difficult to alter any block of data without others realizing that the data has changed, thus preventing unauthorized changes to the data.

Hash

A hash is a calculated key in each block. It’s like a fingerprint; no two blocks have the same hash. You can think of the hash and block as two parts of a whole; tampering with anything inside the block will cause the hash to change as well. This will make the chain invalid because then the “Previous hash” of the next block won’t match with the hash of this block.

For example, in the picture below, if Block 2 was changed, the hash of Block 2 won’t be 6BQ1 anymore and Block 3’s “Previous hash” won’t match with the hash of Block 2 anymore. This mismatch of hash will make the whole chain invalid.

Image Source: How does a Blockchain work- Simply Explained (YouTube Channel)

Distributed Ledger

A distributed ledger is any database which can be replicated and shared among the members of a decentralized network. It’s a highly secure database, and eliminates the need for a third party (like a bank for example) that can validate the authenticity of the data. Blockchain is a type of distributed ledger.

To further prevent unauthorized alteration, blockchain uses proof-of-work (POW) or proof-of-stake (POS) — a delay in the calculation of hash of any block of data that changes — to validate transactions as well as to issue new blocks. This delay prevents fraudulent changes to data since any fraudulent change can be detected and eliminated before they get replicated to all copies of the data in the distributed network.

Blockchain is usually used in places with high possibility of theft/fraud, as an intermediary or middleman, and in environments with high throughput. It’s also the technology behind cryptocurrencies, which is an online medium of exchange. In simple words, it’s money with real value that can be traded digitally without having a third party, like a bank, involved in the transaction.

Blockchain was launched over 10 years ago, under the pseudonym Satoshi Nakamoto, the same person(s) who developed Bitcoin, one of the biggest cryptocurrencies in the world.

Sharding- What is it?

Now that we know what blockchain is, let’s get into sharding. The word shard itself means “a small part of a whole”. Sharding is just that, splitting a lone database and separating and storing it into smaller databases.

Say you needed to find the information of someone in your Book Club, which had 20,000 people in it. Instead of searching through 20,000 people by yourself, it would be much faster if you got 5 people to search through 4000 people.

Image Source: Database, Oracle Sharding Adventure (Website)

It’s the same thing with sharding, where each database only has a subset of the total number of records, making it easier to sort though.

Why is it beneficial?

  • If a dataset is too large to fit in one base, sharding can fix that problem by splitting the data up in other databases
  • It can give you an almost unlimited storage capacity
  • Processes more transactions per second (can reduce the slowness of a database)

Concerns with sharding

Though sharding sounds great (and for the most part it is), the idea of using sharding in blockchain is relatively new. Although sharding can help with the speed and workload of a database, there is the risk of being hacked. It’s very difficult to hack into a blockchain, but not as hard to hack a shard. A hacker could potentially introduce false or fake transactions, which would ruin the authenticity of the blockchain.

Blockchains using proof-of-stake could potentially fix this problem by randomly assigning nodes (a data point in a larger network) to each shard and then re-assigning them after a certain time period.

Common Sharding Strategies

Like most things, there are common strategies people like to use while sharding. I’ll explain three of these strategies below.

Image Source: “What is Sharding” — Hazelcast.com

Horizontal Based Sharding

This strategy is probably the most common, and includes splitting a table in a single database into smaller tables in multiple databases. Let me explain what this means. Say we had information about 20,000 customers in a single table on a database server. Sharding would split this one table, and put information of the customers into different databases on different servers. This is also called horizontal-based sharding or horizontal-partitioning. In the picture above, the original table has all the rows, whereas in tables HS1 and HS2, the rows have been split into two different tables.

However, this strategy may not be work if the data split horizontally leads to different number of rows in different databases. For example, if customer data was divided into different databases based on the first letter of their last names, with data from customers with last names beginning with letters A-H going to database 1, data from customers with last names beginning with letters I- P going to database 2 and the rest going to database 3, each database could end up getting unequal numbers of rows.

Vertical Based Sharding

Vertical-based sharding splits the data by columns rather than rows. Say you wanted to split up the data for each book club member. Database 1 could have their identification number (to identify them) along with their position in the club, while Database 2 could have their ID and full name, and Database 3 could have their ID and phone number. This way you would be able to split all the data equally and would have an equal amount of columns. In the picture above, the original table has all the columns, whereas, in tables VS1 and VS2, the columns have been split into two different tables.

The catch is that if you had too many columns on too many databases, you would have to go to each database for each piece of information. Say I wanted to contact the Vice-President of the book club. I would first have to go to Database 1 to find the ID number of the VP. Then I would have to go to Database 2 to match the ID with the name. Lastly, I’d have to go to Database 3 to find their phone number. This whole process would be very time consuming, especially if you wanted the information for more than one person.

Hash Based Sharding

Another common strategy is hash based sharding. Hash based sharding splits the data based on a calculated hash. First, the data is split evenly among all the available databases, and each a method of deciding the server that any new data should be added to is created.

Image Source: System Design Interview Concepts — Database Sharding (Deb Haldar)

In the photo above, we have 4 database servers. Let’s assume that each time a request to add new data is accompanied by a key that gets incremented by 1 with each new request. A simple mathematical modulo function on the key with the number 4 will give a result of 0 to 3. This result can be used to direct the new request to the appropriate database. Database 0 will be chosen to keep any data that has been given the number 0 or 4. Database 1 will be chosen to keep any data given the number 1 and 5. It’s the same with the other two with the numbers 2 and 6 for Database 2 and 3 and 7 for Database 3. This way, new data can continue to be evenly distributed to the different database servers.

A disadvantage of this is that if we wanted to add another database, let’s call it Server 4. We would have to re-number and reorganize all the data to ensure Server 4 has a purpose. Also, the method of assigning new data to the servers will have to be altered. And if you wanted to add two more databases, well I can only imagine how time-consuming that would be.

Key Takeaways

  • Blockchain is the technology behind cryptocurrencies. It’s an online database, and it’s a type of shared ledger
  • Sharding means to split a single database into many smaller ones, giving you unlimited storage
  • It’s much easier to hack into a shard than a blockchain, and there aren’t too many options to prevent this at the moment
  • There are many strategies you can use to shard, some of these being horizontal based sharding, vertical based sharing, and hash based sharding

Conclusion

Blockchain and cryptocurrency are today’s buzzwords that can turn people’s attention toward you as soon as you say them. They could also be great ice breakers if you have sufficient knowledge to talk about them and are surrounded by people interested in them. However, the technologies that make them possible are not entirely new.

One of the key concepts used to make blockchains work is sharding. There are several strategies to “shard” data. Each of the strategies have their pros and cons. The specific strategy used is dependent on the application and must be carefully selected based on the usage scenario of the application. While it is not easy to fraudulently alter data in a blockchain, it might be easier to do so in a smaller shard.

It is also important to note that while there are many people in the world now who mine for miniature pieces of bitcoins or other cryptocurrencies because of the exponential growth in their value in the last couple of years, the technologies that make all of this possible is simple enough for anyone to understand.

What excites me more though, is that cryptocurrencies are not the only application of this technology. I am confident that in the near future, we will see other interesting applications of these — hopefully ones that are not just used for material gains but can also improve the quality of life of us humans!

Extra Resources

Here are some of the articles/websites I used to make this article. Check them out!

  1. Why is Blockchain Important and Why Does it Matters?
  2. Sharding and the Future of Blockchain
  3. How Sharding Works
  4. What is Sharding?
  5. System Design Interview Concepts — Database Sharding

If you enjoyed reading this article, feel free to connect with me on LinkedIn here.

I also have a newsletter which gets a new edition every month. I would love for you to join my journey!

Thanks for reading this article, see you next time!

--

--