Introduction
Odds are youve heard about the Ethereum blockchain, whether or not you know what it is. Its been in the news a lot lately, including the cover of some major magazines, but reading those articles can be like gibberish if you dont have a foundation for what exactly Ethereum is. So what is it? In essence, a public database that keeps a permanent record of digital transactions. Importantly, this database doesnt require any central authority to maintain and secure it. Instead it operates as a trustless transactional systema framework in which individuals can make peer-to-peer transactions without needing to trust a third party OR one another.
Still confused? Thats where this post comes in. My aim is to explain how Ethereum functions at a technical level, without complex math or scary-looking formulas. Even if youre not a programmer, I hope youll walk away with at least better grasp of the tech. If some parts are too technical and difficult to grok, thats totally fine! Theres really no need to understand every little detail. I recommend just focusing on understanding things at a broad level.Many of the topics covered in this post are a breakdown of the concepts discussed in the yellow paper. Ive added my own explanations and diagrams to make understanding Ethereum easier. Those brave enough to take on the technical challenge can also read the Ethereum yellow paper.
Lets get started!
A blockchain is a cryptographically secure transactional singleton machine with shared-state. [1] Thats a mouthful, isnt it? Lets break it down.
Ethereum implements this blockchain paradigm.
The Ethereum blockchain is essentially a transaction-based state machine. In computer science, a state machine refers to something that will read a series of inputs and, based on those inputs, will transition to a new state.
With Ethereums state machine, we begin with a genesis state. This is analogous to a blank slate, before any transactions have happened on the network. When transactions are executed, this genesis state transitions into some final state. At any point in time, this final state represents the current state of Ethereum.
The state of Ethereum has millions of transactions. These transactions are grouped into blocks. A block contains a series of transactions, and each block is chained together with its previous block.
To cause a transition from one state to the next, a transaction must be valid. For a transaction to be considered valid, it must go through a validation process known as mining. Mining is when a group of nodes (i.e. computers) expend their compute resources to create a block of valid transactions.
Any node on the network that declares itself as a miner can attempt to create and validate a block. Lots of miners from around the world try to create and validate blocks at the same time. Each miner provides a mathematical proof when submitting a block to the blockchain, and this proof acts as a guarantee: if the proof exists, the block must be valid.
For a block to be added to the main blockchain, the miner must prove it faster than any other competitor miner. The process of validating each block by having a miner provide a mathematical proof is known as a proof of work.
A miner who validates a new block is rewarded with a certain amount of value for doing this work. What is that value? The Ethereum blockchain uses an intrinsic digital token called Ether. Every time a miner proves a block, new Ether tokens are generated and awarded.
You might wonder: what guarantees that everyone sticks to one chain of blocks? How can we be sure that there doesnt exist a subset of miners who will decide to create their own chain of blocks?
Earlier, we defined a blockchain as a transactional singleton machine with shared-state. Using this definition, we can understand the correct current state is a single global truth, which everyone must accept. Having multiple states (or chains) would ruin the whole system, because it would be impossible to agree on which state was the correct one. If the chains were to diverge, you might own 10 coins on one chain, 20 on another, and 40 on another. In this scenario, there would be no way to determine which chain was the most valid.
Whenever multiple paths are generated, a fork occurs. We typically want to avoid forks, because they disrupt the system and force people to choose which chain they believe in.
To determine which path is most valid and prevent multiple chains, Ethereum uses a mechanism called the GHOST protocol.
GHOST = Greedy Heaviest Observed Subtree
In simple terms, the GHOST protocol says we must pick the path that has had the most computation done upon it. One way to determine that path is to use the block number of the most recent block (the leaf block), which represents the total number of blocks in the current path (not counting the genesis block). The higher the block number, the longer the path and the greater the mining effort that must have gone into arriving at the leaf. Using this reasoning allows us to agree on the canonical version of the current state.
Now that youve gotten the 10,000-foot overview of what a blockchain is, lets dive deeper into the main components that the Ethereum system is comprised of:
One note before getting started: whenever I say hash of X, I am referring to the KECCAK-256 hash, which Ethereum uses.
The global shared-state of Ethereum is comprised of many small objects (accounts) that are able to interact with one another through a message-passing framework. Each account has a state associated with it and a 20-byte address. An address in Ethereum is a 160-bit identifier that is used to identify any account.
There are two types of accounts:
Its important to understand a fundamental difference between externally owned accounts and contract accounts. An externally owned account can send messages to other externally owned accounts OR to other contract accounts by creating and signing a transaction using its private key. A message between two externally owned accounts is simply a value transfer. But a message from an externally owned account to a contract account activates the contract accounts code, allowing it to perform various actions (e.g. transfer tokens, write to internal storage, mint new tokens, perform some calculation, create new contracts, etc.).
Unlike externally owned accounts, contract accounts cant initiate new transactions on their own. Instead, contract accounts can only fire transactions in response to other transactions they have received (from an externally owned account or from another contract account). Well learn more about contract-to-contract calls in the Transactions and Messages section.
Therefore, any action that occurs on the Ethereum blockchain is always set in motion by transactions fired from externally controlled accounts.
The account state consists of four components, which are present regardless of the type of account:
Okay, so we know that Ethereums global state consists of a mapping between account addresses and the account states. This mapping is stored in a data structure known as a Merkle Patricia tree.
A Merkle tree (or also referred as Merkle trie) is a type of binary tree composed of a set of nodes with:
The data at the bottom of the tree is generated by splitting the data that we want to store into chunks, then splitting the chunks into buckets, and then taking the hash of each bucket and repeating the same process until the total number of hashes remaining becomes only one: the root hash.
This tree is required to have a key for every value stored inside it. Beginning from the root node of the tree, the key should tell you which child node to follow to get to the corresponding value, which is stored in the leaf nodes. In Ethereums case, the key/value mapping for the state tree is between addresses and their associated accounts, including the balance, nonce, codeHash, and storageRoot for each account (where the storageRoot is itself a tree).
This same trie structure is used also to store transactions and receipts. More specifically, every block has a header which stores the hash of the root node of three different Merkle trie structures, including:
The ability to store all this information efficiently in Merkle tries is incredibly useful in Ethereum for what we call light clients or light nodes. Remember that a blockchain is maintained by a bunch of nodes. Broadly speaking, there are two types of nodes: full nodes and light nodes.
A full archive node synchronizes the blockchain by downloading the full chain, from the genesis block to the current head block, executing all of the transactions contained within. Typically, miners store the full archive node, because they are required to do so for the mining process. It is also possible to download a full node without executing every transaction. Regardless, any full node contains the entire chain.
But unless a node needs to execute every transaction or easily query historical data, theres really no need to store the entire chain. This is where the concept of a light node comes in. Instead of downloading and storing the full chain and executing all of the transactions, light nodes download only the chain of headers, from the genesis block to the current head, without executing any transactions or retrieving any associated state. Because light nodes have access to block headers, which contain hashes of three tries, they can still easily generate and receive verifiable answers about transactions, events, balances, etc.
The reason this works is because hashes in the Merkle tree propagate upwardif a malicious user attempts to swap a fake transaction into the bottom of a Merkle tree, this change will cause a change in the hash of the node above, which will change the hash of the node above that, and so on, until it eventually changes the root of the tree.
Any node that wants to verify a piece of data can use something called a Merkle proof to do so. A Merkle proof consists of:
Anyone reading the proof can verify that the hashing for that branch is consistent all the way up the tree, and therefore that the given chunk is actually at that position in the tree.
In summary, the benefit of using a Merkle Patricia tree is that the root node of this structure is cryptographically dependent on the data stored in the tree, and so the hash of the root node can be used as a secure identity for this data. Since the block header includes the root hash of the state, transactions, and receipts trees, any node can validate a small part of state of Ethereum without needing to store the entire state, which can be potentially unbounded in size.
One very important concept in Ethereum is the concept of fees. Every computation that occurs as a result of a transaction on the Ethereum network incurs a feetheres no free lunch! This fee is paid in a denomination called gas.
Gas is the unit used to measure the fees required for a particular computation. Gas price is the amount of Ether you are willing to spend on every unit of gas, and is measured in gwei. Wei is the smallest unit of Ether, where 1 Wei represents 1 Ether. One gwei is 1,000,000,000 Wei.
With every transaction, a sender sets a gas limit and gas price. The product of gas price and gas limit represents the maximum amount of Wei that the sender is willing to pay for executing a transaction.
For example, lets say the sender sets the gas limit to 50,000 and a gas price to 20 gwei. This implies that the sender is willing to spend at most 50,000 x 20 gwei = 1,000,000,000,000,000 Wei = 0.001 Ether to execute that transaction.
Remember that the gas limit represents the maximum gas the sender is willing to spend money on. If they have enough Ether in their account balance to cover this maximum, theyre good to go. The sender is refunded for any unused gas at the end of the transaction, exchanged at the original rate.
In the case that the sender does not provide the necessary gas to execute the transaction, the transaction runs out of gas and is considered invalid. In this case, the transaction processing aborts and any state changes that occurred are reversed, such that we end up back at the state of Ethereum prior to the transaction. Additionally, a record of the transaction failing gets recorded, showing what transaction was attempted and where it failed. And since the machine already expended effort to run the calculations before running out of gas, logically, none of the gas is refunded to the sender.
Where exactly does this gas money go? All the money spent on gas by the sender is sent to the beneficiary address, which is typically the miners address. Since miners are expending the effort to run computations and validate transactions, miners receive the gas fee as a reward.
Typically, the higher the gas price the sender is willing to pay, the greater the value the miner derives from the transaction. Thus, the more likely miners will be to select it. In this way, miners are free to choose which transactions they want to validate or ignore. In order to guide senders on what gas price to set, miners have the option of advertising the minimum gas price for which they will execute transactions.
Not only is gas used to pay for computation steps, it is also used to pay for storage usage. The total fee for storage is proportional to the smallest multiple of 32 bytes used.
Fees for storage have some nuanced aspects. For example, since increased storage increases the size of the Ethereum state database on all nodes, theres an incentive to keep the amount of data stored small. For this reason, if a transaction has a step that clears an entry in the storage, the fee for executing that operation of is waived, AND a refund is given for freeing up storage space.
One important aspect of the way the Ethereum works is that every single operation executed by the network is simultaneously effected by every full node. However, computational steps on the Ethereum Virtual Machine are very expensive. Therefore, Ethereum smart contracts are best used for simple tasks, like running simple business logic or verifying signatures and other cryptographic objects, rather than more complex uses, like file storage, email, or machine learning, which can put a strain on the network. Imposing fees prevents users from overtaxing the network.
Ethereum is a Turing complete language. (In short, a Turing machine is a machine that can simulate any computer algorithm (for those not familiar with Turing machines, check out this and this). This allows for loops and makes Ethereum susceptible to the halting problem, a problem in which you cannot determine whether or not a program will run infinitely. If there were no fees, a malicious actor could easily try to disrupt the network by executing an infinite loop within a transaction, without any repercussions. Thus, fees protect the network from deliberate attacks.
You might be thinking, why do we also have to pay for storage? Well, just like computation, storage on the Ethereum network is a cost that the entire network has to take the burden of.
We noted earlier that Ethereum is a transaction-based state machine. In other words, transactions occurring between different accounts are what move the global state of Ethereum from one state to the next.
In the most basic sense, a transaction is a cryptographically signed piece of instruction that is generated by an externally owned account, serialized, and then submitted to the blockchain.
There are two types of transactions: message calls and contract creations (i.e. transactions that create new Ethereum contracts).All transactions contain the following components, regardless of their type:
We learned in the Accounts section that transactionsboth message calls and contract-creating transactionsare always initiated by externally owned accounts and submitted to the blockchain. Another way to think about it is that transactions are what bridge the external world to the internal state of Ethereum.
But this doesnt mean that contracts cant talk to other contracts. Contracts that exist within the global scope of Ethereums state can talk to other contracts within that same scope. The way they do this is via messages or internal transactions to other contracts. We can think of messages or internal transactions as being similar to transactions, with the major difference that they are NOT generated by externally owned accounts. Instead, they are generated by contracts. They are virtual objects that, unlike transactions, are not serialized and only exist in the Ethereum execution environment.
When one contract sends an internal transaction to another contract, the associated code that exists on the recipient contract account is executed.
One important thing to note is that internal transactions or messages dont contain a gasLimit. This is because the gas limit is determined by the external creator of the original transaction (i.e. some externally owned account). The gas limit that the externally owned account sets must be high enough to carry out the transaction, including any sub-executions that occur as a result of that transaction, such as contract-to-contract messages. If, in the chain of transactions and messages, a particular message execution runs out of gas, then that messages execution will revert, along with any subsequent messages triggered by the execution. However, the parent execution does not need to revert.
All transactions are grouped together into blocks. A blockchain contains a series of such blocks that are chained together.
In Ethereum, a block consists of:
What the heck is an ommer? An ommer is a block whose parent is equal to the current blocks parents parent. Lets take a quick dive into what ommers are used for and why a block contains the block headers for ommers.
Because of the way Ethereum is built, block times are much lower (~15 seconds) than those of other blockchains, like Bitcoin (~10 minutes). This enables faster transaction processing. However, one of the downsides of shorter block times is that more competing block solutions are found by miners. These competing blocks are also referred to as orphaned blocks (i.e. mined blocks do not make it into the main chain).
The purpose of ommers is to help reward miners for including these orphaned blocks. The ommers that miners include must be valid, meaning within the sixth generation or smaller of the present block. After six children, stale orphaned blocks can no longer be referenced (because including older transactions would complicate things a bit).
Ommer blocks receive a smaller reward than a full block. Nonetheless, theres still some incentive for miners to include these orphaned blocks and reap a reward.
Lets get back to blocks for a moment. We mentioned previously that every block has a block header, but what exactly is this?A block header is a portion of the block consisting of:
Notice how every block header contains three trie structures for:
These trie structures are nothing but the Merkle Patricia tries we discussed earlier.
Additionally, there are a few terms from the above description that are worth clarifying. Lets take a look.
Ethereum allows for logs to make it possible to track various transactions and messages. A contract can explicitly generate a log by defining events that it wants to log.
A log entry contains:
Logs are stored in a bloom filter, which stores the endless log data in an efficient manner.
Logs stored in the header come from the log information contained in the transaction receipt. Just as you receive a receipt when you buy something at a store, Ethereum generates a receipt for every transaction. Like youd expect, each receipt contains certain information about the transaction. This receipt includes items like:
The difficulty of a block is used to enforce consistency in the time it takes to validate blocks. The genesis block has a difficulty of 131,072, and a special formula is used to calculate the difficulty of every block thereafter. If a certain block is validated more quickly than the previous block, the Ethereum protocol increases that blocks difficulty.
The difficulty of the block affects the nonce, which is a hash that must be calculated when mining a block, using the proof-of-work algorithm.
The relationship between the blocks difficulty and nonce is mathematically formalized as:
where Hd is the difficulty.
The only way to find a nonce that meets a difficulty threshold is to use the proof-of-work algorithm to enumerate all of the possibilities. The expected time to find a solution is proportional to the difficultythe higher the difficulty, the harder it becomes to find the nonce, and so the harder it is to validate the block, which in turn increases the time it takes to validate a new block. So, by adjusting the difficulty of a block, the protocol can adjust how long it takes to validate a block.
If, on the other hand, validation time is getting slower, the protocol decreases the difficulty. In this way, the validation time self-adjusts to maintain a constant rateon average, one block every 15 seconds.
Weve come to one of the most complex parts of the Ethereum protocol: the execution of a transaction. Say you send a transaction off into the Ethereum network to be processed. What happens to transition the state of Ethereum to include your transaction?
First, all transactions must meet an initial set of requirements in order to be executed. These include:
If the transaction meets all of the above requirements for validity, then we move onto the next step.
First, we deduct the upfront cost of execution from the senders balance, and increase the nonce of the senders account by 1 to account for the current transaction. At this point, we can calculate the gas remaining as the total gas limit for the transaction minus the intrinsic gas used.
Next, the transaction starts executing. Throughout the execution of a transaction, Ethereum keeps track of the substate. This substate is a way to record information accrued during the transaction that will be needed immediately after the transaction completes. Specifically, it contains:
Next, the various computations required by the transaction are processed.
Once all the steps required by the transaction have been processed, and assuming there is no invalid state, the state is finalized by determining the amount of unused gas to be refunded to the sender. In addition to the unused gas, the sender is also refunded some allowance from the refund balance that we described above.
Once the sender is refunded:
Finally, were left with the new state and a set of the logs created by the transaction.
Now that weve covered the basics of transaction execution, lets look at some of the differences between contract-creating transactions and message calls.
Recall that in Ethereum, there are two types of accounts: contract accounts and externally owned accounts. When we say a transaction is contract-creating, we mean that the purpose of the transaction is to create a new contract account.
In order to create a new contract account, we first declare the address of the new account using a special formula. Then we initialize the new account by:
Once we initialize the account, we can actually create the account, using the init code sent with the transaction (see the Transaction and messages section for a refresher on the init code). What happens during the execution of this init code is varied. Depending on the constructor of the contract, it might update the accounts storage, create other contract accounts, make other message calls, etc.
As the code to initialize a contract is executed, it uses gas. The transaction is not allowed to use up more gas than the remaining gas. If it does, the execution will hit an out-of-gas (OOG) exception and exit. If the transaction exits due to an out-of-gas exception, then the state is reverted to the point immediately prior to transaction. The sender is not refunded the gas that was spent before running out.
Boo hoo.
However, if the sender sent any Ether value with the transaction, the Ether value will be refunded even if the contract creation fails. Phew!
If the initialization code executes successfully, a final contract-creation cost is paid. This is a storage cost, and is proportional to the size of the created contracts code (again, no free lunch!) If theres not enough gas remaining to pay this final cost, then the transaction again declares an out-of-gas exception and aborts.
If all goes well and we make it this far without exceptions, then any remaining unused gas is refunded to the original sender of the transaction, and the altered state is now allowed to persist!
Hooray!
The execution of a message call is similar to that of a contract creation, with a few differences.
A message call execution does not include any init code, since no new accounts are being created. However, it can contain input data, if this data was provided by the transaction sender. Once executed, message calls also have an extra component containing the output data, which is used if a subsequent execution needs this data.
See the original post:
How does Ethereum work, anyway? - Medium