Ethereum logo

Ethereum Source Code Walkthrough

Published 2018-06-13.
Time to read: 6 minutes.

This page is part of the posts collection, categorized under Blockchain, Ethereum, Go, Open Source.

April 27, 2020 Update

This was a sample work in progress as part of a proposal to the Ethereum Foundation Grants committee. They declined to fund this activity, but gave no reason and no feedback as to what might be acceptable. The preparation of my proposal took weeks, and the preliminary feedback that I had received from many knowledgeable people was that it had great value.

I felt that the evaluation process was broken, in fact the entire organization was broken, and there was little hope that Ethereum would ever become a professional organization. Two years later, I believe history has proved me right. This was the last blockchain-related initiative I participated in.

I no longer maintain web3j-scala, an Ethereum-related project for Scala programmers. I created that open source project and worked on it for free for 3 years. It has been forked and I’ve been told it is or was used in production. However, I see no reason to continue working for free on it. Others can carry the project forward if they want.

This is a brief walkthrough of some of the core source files for smart contracts in the official Go language Ethereum implementation, which includes the geth command-line Ethereum client program, along with many other programs. Ethereum clients include an implementation of the Ethereum Virtual Machine (EVM), which are able to parse and verify the Ethereum blockchain, including smart contracts, and provides interfaces to create transactions and mine blocks.

I’ve added some suggestions for how the source code might be improved. If there is general agreement that these suggestions make sense (tell me in the comments!) then I’ll create a pull request.

License

LGPL logo

This Ethereum client project was released under the GNU Lesser General Public License, version 3 or later, which permits use of the code as a library in proprietary programs.

Source Files

The gocloc program counted the following source files and lines:

Language Files Blank Lines Comment Lines Code Lines
Go 1824 58,134 81,861 639,435
C 55 17,257 30,909 84,719
C Header 97 2,559 6,318 15,083
Markdown 88 3,152 0 9,175
JavaScript 13 1,845 4,495 7,986
Assembly 39 557 957 3,783
JSON 17 0 0 2,065
Protocol Buffers 2 113 40 1,030
Plain Text 11 217 0 954
C++ 4 132 102 937
BASH 10 178 315 931
Perl 10 268 1,289 879
JSX 11 119 245 722
XML 9 0 0 651
M4 4 79 99 649
YAML 20 77 42 581
NSIS 5 86 154 446
Java 4 143 187 438
Makefile 11 101 84 381
Python 6 154 250 339
HTML 3 15 9 245
Solidity 7 56 171 213
Bourne Shell 6 23 25 119
CMake 1 9 0 35
Awk 1 4 4 17
TOML 1 0 0 3
Total 2,260 85,278 127,556 771,825

Packages

I used the following incantation to discover that geth defines 244 packages:

Shell
$ grep -rh "^package" | grep -v "not installed" | \
  tr -d ';' | sed 's^//.*^^' | awk '{$1=$1};1' | \
  sort | uniq | wc -l

I won't list them all. The godoc for the project contains much of the following documentation for the top-level packages. I provided the rest of the information from disparate sources, including reading the source code:

accounts implements high-level Ethereum account management.
trie provides a binary Merkle tree implementation.
cmd Contains the following command-line tools. Most tools support the --help option.
abigen source code generator to convert Ethereum contract definitions into easy to use, compile-time type-safe Go packages. It operates on plain Ethereum contract ABIs with expanded functionality if the contract bytecode is also available. However it also accepts Solidity source files, making development much more streamlined. Please see the Native DApps wiki page for details.
bootnode runs a bootstrap node for the Ethereum Discovery Protocol. This is a stripped-down version of geth that only takes part in the network node discovery protocol, and does not run any of the higher level application protocols. It can be used as a lightweight bootstrap node to aid in finding peers in private networks.
clef a standalone signer that manages keys across multiple Ethereum-aware apps such as Geth, Metamask, and cpp-ethereum. Alpha quality, not released yet.
ethkey a key/wallet management tool for Ethereum keys. Allows user to add, remove and change their keys, and supports cold wallet device-friendly transaction inspection and signing. This documentation was written for the C++ Ethereum client implementation, but it is probably suitable for the Go implementation as well.
evm a version of the EVM (Ethereum Virtual Machine) for running bytecode snippets within a configurable environment and execution mode. Allows isolated, fine-grained debugging of EVM opcodes. Example usage:
Shell
evm --code 60ff60ff --debug
faucet an Ether faucet backed by a light client.
geth official command-line client for Ethereum. It provides the entry point into the Ethereum network (main-, test- or private net), capable of running as a full node (default) archive node (retaining all historical state) or a light node (retrieving data live). It can be used by other processes as a gateway into the Ethereum network via JSON RPC endpoints exposed on top of HTTP, WebSocket and/or IPC transports. For more information see the CLI Wiki page.
p2psim a simulation HTTP API. Docs are here.
puppeth assembles and maintains private networks.
rlpdump a pretty-printer for RLP data. RLP (Recursive Length Prefix) is the data encoding used by the Ethereum protocol. Sample usage:
Shell
rlpdump --hex CE0183FFFFFFC4C304050583616263
swarm provides the bzzhash command, which computes a swarm tree hash, and implements the swarm daemon and tools. See the swarm documentation for more information.
wnode simple Whisper node. It could be used as a stand-alone bootstrap node. Also could be used for different test and diagnostics purposes.
common contains various helper functions worth checking out
consensus implements different Ethereum consensus engines (which must conform to the Engine interface): clique implements proof-of-authority consensus, and ethash implements proof-of-work consensus.
console Ethereum implements a JavaScript runtime environment (JSRE) that can be used in either interactive (console) or non-interactive (script) mode. Ethereum's JavaScript console exposes the full web3 JavaScript Dapp API and the admin API. More documentation is here. This package implements JSRE for the geth console and geth console subcommands.
containers
contracts
core implements the Ethereum consensus protocol, implements the Ethereum Virtual Machine, and other miscellaneous important bits
crypto cryptographic implementations
dashboard
eth implements the Ethereum protocol
ethclient provides a client for the Ethereum RPC API
ethdb
ethstats implements the network stats reporting service
event deals with subscriptions to real-time events
internal Debugging support, JavaScript dependencies, testing support
les implements the Light Ethereum Subprotocol
light implements on-demand retrieval capable state and chain objects for the Ethereum Light Client
log provides an opinionated, simple toolkit for best-practice logging that is both human and machine readable
metrics port of Coda Hale's Metrics library. Unclear why this was not implemented as a separate library, like this one.
miner implements Ethereum block creation and mining
mobile contains the simplified mobile APIs to go-ethereum
node sets up multi-protocol Ethereum nodes
p2p implements the Ethereum p2p network protocols: Node Discovery Protocol, RLPx v5 Topic Discovery Protocol, Ethereum Node Records as defined in EIP-778, common network port mapping protocols, and p2p network simulation.
params
rlp implements the RLP serialization format
rpc provides access to the exported methods of an object across a network or other I/O connection
signer
swarm
tests implements execution of Ethereum JSON tests
trie implements Merkle Patricia Tries
vendor contains a minimal framework for creating and organizing command line Go applications, and a rich testing extension for Go's testing package
whisper implements the Whisper protocol

I used the following incantation to list the package names:

Shell
find . -maxdepth 1 -type d | sed 's^\./^^' | sed '/\..*/d'

The build/ directory does not contain a Go source package; instead, it contains scripts and configurations for building the package in various environments.

Smart Contract Source Code

The core/vm directory contains the files that implement the EVM. These files are part of the vm package. Let's look at two of them:

  • contract.go, which defines smart contract behavior.
  • contracts.go, responsible for executing smart contracts on the EVM.

Referenced Types

Two of the types used in the source files that we would like to understand are defined in common/types.go. Let's look at them first.

Address is defined as an array of 20 bytes:

Shell
const (
    HashLength    = 32
    AddressLength = 20
)

// Address represents the 20 byte address of an Ethereum account.
type Address [AddressLength]byte

Hash is defined as an array of 32 bytes:

Shell
// Hash represents the 32 byte Keccak256 hash of arbitrary data.
type Hash [HashLength]byte

The opcodes for each version of the EVM are defined in jump_table.go. The operation struct defines the properties:

Shell
type operation struct {
    // execute is the operation function
    execute executionFunc
    // gasCost is the gas function and returns the gas required for execution
    gasCost gasFunc
    // validateStack validates the stack (size) for the operation
    validateStack stackValidationFunc
    // memorySize returns the memory size required for the operation
    memorySize memorySizeFunc

    halts   bool // indicates whether the operation should halt further execution
    jumps   bool // indicates whether the program counter should not increment
    writes  bool // determines whether this a state modifying operation
    valid   bool // indication whether the retrieved operation is valid and known
    reverts bool // determines whether the operation reverts state (implicitly halts)
    returns bool // determines whether the operations sets the return data content
}

Notice the jumps property, a Boolean, which if set indicates that the program counter should not increment after executing any form of jump opcode.

The destinations type maps the hash of a smart contract to a bit vector for each the smart contract's entry points. If a bit is set, that indicates the EMV's program counter should increment after executing the entry point. analysis.go defines the destinations type like this:

Shell
// destinations stores one map per contract (keyed by hash of code).
// The maps contain an entry for each location of a JUMPDEST instruction.
type destinations map[common.Hash]bitvec

contract.go

This file defines smart contract behavior.

Imports

This comment applies to all of the Go source files in the entire project. I think the following absolute import would have been better specified as a relative import:

Shell
"github.com/ethereum/go-ethereum/common"

The relative import would look like this instead:

Shell
"../../common"

If relative imports were used instead of absolute imports that point to the github repo, local changes to the project made by a developer would automatically be picked up. As currently written, absolute imports cause local changes to be ignored, in favor of the version on github. It might take a software developer a while to realize that the reason why their changes are ignored by most of the code base is because absoluate imports were used. It would then be painful to for the developer to modify the affected source files throughout the project such that they used relative imports.

Types

The publicly visible AccountRef type is defined as:

Shell
// Account references are used during EVM initialisation and
// it's primary use is to fetch addresses. Removing this object
// proves difficult because of the cached jump destinations which
// are fetched from the parent contract (i.e. the caller), which
// is a ContractRef.
type AccountRef common.Address

The same file defines a type cast from AccountRef to Address:

Shell
// Address casts AccountRef to a Address
func (ar AccountRef) Address() common.Address { return (common.Address)(ar) }

The ContractRef interface is used by the Contract struct, which we'll see in a moment. This ContractRef interface just consists of an Address.

Shell
// ContractRef is a reference to the contract's backing object
type ContractRef interface {
    Address() common.Address
}

The Contract struct defines the behavior of Ethereum smart contracts, and is central to the topic, so here it is in all its glory:

Shell
type Contract struct {
    CallerAddress common.Address
    caller    ContractRef
    self      ContractRef

    jumpdests destinations // result of JUMPDEST analysis.

    Code     []byte
    CodeHash common.Hash
    CodeAddr *common.Address
    Input    []byte

    Gas   uint64
    value *big.Int

    Args []byte

    DelegateCall bool
}

CallerAddress is a publicly visible Address of the caller. caller and self are private ContractRefs, which as we know are really just Addresses.

jumpdests, a private field, has type destinations, which as we've already discussed defines if the entry point in the smart contract that need the program counter to be incremented after executing.

Code is a a publicly visible byte slice. We don't yet know if this is the smart contract source code, compiled code, or something else.

CodeHash is the publicly visible hash of the Code, while CodeAddr is a publicly visible pointer to the Address (of the code, presumably).

Gas is the publicly visible amount of Ethereum gas allocated by the user for executing this smart contract, stored as an unsigned 64-bit integer.

Value is a private pointer to a big integer. Possibly this might be the result of executing the contract?

Args is a publicly visible byte slice, not sure what it is for.

DelegateCall is a publicly visible Boolean value, unclear if this means the smart contract was invoked using delegatecall. From the documentation: "This means that a contract can dynamically load code from a different address at runtime. Storage, current address and balance still refer to the calling contract, only the code is taken from the called address. This makes it possible to implement the “library” feature in Solidity: Reusable library code that can be applied to a contract’s storage, e.g. in order to implement a complex data structure."

contracts.go

This file is responsible for executing smart contracts on the EVM.

Imports

The following imports are used:

  • Package sha256 from the crypto project implements the SHA224 and SHA256 hash algorithms as defined in FIPS 180-4.
  • errors, the Go language simple error handling primitives, such as error.
  • math/big implements arbitrary-precision arithmetic (big numbers).
  • Other packages in this project (go-ethereum):
    Shell
    "github.com/ethereum/go-ethereum/common"
    "github.com/ethereum/go-ethereum/common/math"
    "github.com/ethereum/go-ethereum/crypto"
    "github.com/ethereum/go-ethereum/crypto/bn256"
    "github.com/ethereum/go-ethereum/params"

    Again, I think the above imports would have been better specified as relative imports:

    Shell
    "../../common"
    "../../common/math"
    "../../crypto"
    "../../crypto/bn256"
    "../../params"
  • ripemd160 implements the RIPEMD-160 hash algorithm, a secure replacement for the MD4 and MD5 hash functions. These hashes are also termed RIPE message digests.

Type PrecompiledContract

PrecompiledContract is the interface for native Go smart contracts. This interface is used by precompiled contracts, as we will see next. Contract is a struct defined in contract.go.

Pre-Compiled Contract Maps

These maps specify various types of cryptographic hashes and utility functions, accessed via their address.

PrecompiledContractsHomestead contains the default set of pre-compiled contract addresses used in the Frontier and Homestead releases of Ethereum: ecrecover, sha256hash, ripemd160hash and dataCopy.

PrecompiledContractsByzantium contains the default set of pre-compiled contract addresses used in the Byzantium Ethereum release. All of the previously defined pre-compiled contract addresses are provided in Byzantium, plus: bigModExp, bn256Add, bn256ScalarMul and bn256Pairing.

I’m not happy about the code duplication, whereby the contents of PrecompiledContractsHomestead are incorporated into PrecompiledContractsByzantium by listing the values again; this would be better expressed by referencing the values of PrecompiledContractsHomestead instead of duplicating them.

Contract Evaluator Function

The RunPrecompiledContract function runs and evaluates the output of a precompiled contract. It accepts three parameters:

  • A PrecompiledContract instance.
  • A byte array of input data.
  • A reference to a Contract, defined in contract.go, discussed above.

The function returns:

  • A byte array containing the output of the contract.
  • An error value, which could be nil.

Other Functions

  • RunPrecompiledContract – runs and evaluates the output of a precompiled contract; returns the output as a byte array and an error.
  • RequiredGas (overloaded) – Computes the gas required for input data, specified as a byte array and returns a uint64.
  • Run (overloaded) – Computes the smart contract for input data, specified as a byte array and returns the result as a left-padded byte array and an error.
  • newCurvePoint – Unmarshals a binary blob into a bn256 elliptic curve point. BN-curves are an elliptic curves suitable for cryptographic pairings that provide high security and efficiency cryptographic schemes. See the IETF paper on Barreto-Naehrig Curves for more information.
* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.