Mike Slinn
Mike Slinn

Ethereum Source Code Walkthrough

Published 2018-06-13.

This article is categorized under Blockchain, Ethereum, Go, Open Source

April 27, 2020

Ethereum logo
This was a sample work in progress as part of a proposal to the Ethereum Foundation Grants committee. They declined to fund this activity, but gave no reason and no feedback as to what might be acceptable. The preparation of my proposal took weeks, and the preliminary feedback that I had received from many knowledgeable people was that it had great value.

I felt that the evaluation process was broken, in fact the entire organization was broken, and there was little hope that Ethereum would ever become a professional organization. 2 years later, I believe history has proved me right. This was the last blockchain-related initiative I participated in.

I no longer maintain web3j-scala, an Ethereum-related project for Scala programmers. I created that open source project and worked on it for free for 3 years. It has been forked and I’ve been told it is or was used in production. However, I see no reason to continue working for free on it. Others can carry the project forward if they want.

This is a brief walkthrough of some of the core source files for smart contracts in the official Go language Ethereum implementation, which includes the geth command-line Ethereum client program, along with many other programs. Ethereum clients include an implementation of the Ethereum Virtual Machine (EVM), which are able to parse and verify the Ethereum blockchain, including smart contracts, and provides interfaces to create transactions and mine blocks.

I’ve added some suggestions for how the source code might be improved. If there is general agreement that these suggestions make sense (tell me in the comments!) then I’ll create a pull request.

License

LGPL logo
This Ethereum client project was released under the GNU Lesser General Public License, version 3 or later, which permits use of the code as a library in proprietary programs.

Source Files

The gocloc program counted the following source files and lines:

LanguageFilesBlank LinesComment LinesCode Lines
Go182458,13481,861639,435
C5517,25730,90984,719
C Header972,5596,31815,083
Markdown883,15209,175
JavaScript131,8454,4957,986
Assembly395579573,783
JSON17002,065
Protocol Buffers2113401,030
Plain Text112170954
C++4132102937
BASH10178315931
Perl102681,289879
JSX11119245722
XML900651
M447999649
YAML207742581
NSIS586154446
Java4143187438
Makefile1110184381
Python6154250339
HTML3159245
Solidity756171213
Bourne Shell62325119
CMake19035
Awk14417
TOML1003
Total2,26085,278127,556771,825

Packages

I used the following incantation to discover that geth defines 244 packages:

$ grep -rh "^package" | grep -v "not installed" | \
  tr -d ';' | sed 's^//.*^^' | awk '{$1=$1};1' | \
  sort | uniq | wc -l

I won't list them all. The godoc for the project contains much of the following documentation for the top-level packages. I provided the rest of the information from disparate sources, including reading the source code:

accountsimplements high-level Ethereum account management.
trieprovides a binary Merkle tree implementation.
cmdContains the following command-line tools. Most tools support the --help option.
abigensource code generator to convert Ethereum contract definitions into easy to use, compile-time type-safe Go packages. It operates on plain Ethereum contract ABIs with expanded functionality if the contract bytecode is also available. However it also accepts Solidity source files, making development much more streamlined. Please see the Native DApps wiki page for details.
bootnoderuns a bootstrap node for the Ethereum Discovery Protocol. This is a stripped-down version of geth that only takes part in the network node discovery protocol, and does not run any of the higher level application protocols. It can be used as a lightweight bootstrap node to aid in finding peers in private networks.
clefa standalone signer that manages keys across multiple Ethereum-aware apps such as Geth, Metamask, and cpp-ethereum. Alpha quality, not released yet.
ethkeya key/wallet management tool for Ethereum keys. Allows user to add, remove and change their keys, and supports cold wallet device-friendly transaction inspection and signing. This documentation was written for the C++ Ethereum client implementation, but it is probably suitable for the Go implementation as well.
evma version of the EVM (Ethereum Virtual Machine) for running bytecode snippets within a configurable environment and execution mode. Allows isolated, fine-grained debugging of EVM opcodes. Example usage:
evm --code 60ff60ff --debug
faucetan Ether faucet backed by a light client.
gethofficial command-line client for Ethereum. It provides the entry point into the Ethereum network (main-, test- or private net), capable of running as a full node (default) archive node (retaining all historical state) or a light node (retrieving data live). It can be used by other processes as a gateway into the Ethereum network via JSON RPC endpoints exposed on top of HTTP, WebSocket and/or IPC transports. For more information see the CLI Wiki page.
p2psima simulation HTTP API. Docs are here.
puppethassembles and maintains private networks.
rlpdumpa pretty-printer for RLP data. RLP (Recursive Length Prefix) is the data encoding used by the Ethereum protocol. Sample usage:
rlpdump --hex CE0183FFFFFFC4C304050583616263
swarmprovides the bzzhash command, which computes a swarm tree hash, and implements the swarm daemon and tools. See the swarm documentation for more information.
wnodesimple Whisper node. It could be used as a stand-alone bootstrap node. Also could be used for different test and diagnostics purposes.
commoncontains various helper functions worth checking out
consensusimplements different Ethereum consensus engines (which must conform to the Engine interface): clique implements proof-of-authority consensus, and ethash implements proof-of-work consensus.
console Ethereum implements a JavaScript runtime environment (JSRE) that can be used in either interactive (console) or non-interactive (script) mode. Ethereum's JavaScript console exposes the full web3 JavaScript Dapp API and the admin API. More documentation is here. This package implements JSRE for the geth console and geth console subcommands.
containers
contracts
coreimplements the Ethereum consensus protocol, implements the Ethereum Virtual Machine, and other miscellaneous important bits
cryptocryptographic implementations
dashboard
ethimplements the Ethereum protocol
ethclientprovides a client for the Ethereum RPC API
ethdb
ethstatsimplements the network stats reporting service
eventdeals with subscriptions to real-time events
internalDebugging support, JavaScript dependencies, testing support
lesimplements the Light Ethereum Subprotocol
lightimplements on-demand retrieval capable state and chain objects for the Ethereum Light Client
logprovides an opinionated, simple toolkit for best-practice logging that is both human and machine readable
metricsport of Coda Hale's Metrics library. Unclear why this was not implemented as a separate library, like this one.
minerimplements Ethereum block creation and mining
mobilecontains the simplified mobile APIs to go-ethereum
nodesets up multi-protocol Ethereum nodes
p2pimplements the Ethereum p2p network protocols: Node Discovery Protocol, RLPx v5 Topic Discovery Protocol, Ethereum Node Records as defined in EIP-778, common network port mapping protocols, and p2p network simulation.
params
rlpimplements the RLP serialization format
rpcprovides access to the exported methods of an object across a network or other I/O connection
signer
swarm
testsimplements execution of Ethereum JSON tests
trieimplements Merkle Patricia Tries
vendorcontains a minimal framework for creating and organizing command line Go applications, and a rich testing extension for Go's testing package
whisperimplements the Whisper protocol

I used the following incantation to list the package names:

find . -maxdepth 1 -type d | sed 's^\./^^' | sed '/\..*/d'

The build/ directory does not contain a Go source package; instead, it contains scripts and configurations for building the package in various environments.

Smart Contract Source Code

The core/vm directory contains the files that implement the EVM. These files are part of the vm package. Let's look at two of them:

  • contract.go, which defines smart contract behavior.
  • contracts.go, responsible for executing smart contracts on the EVM.

Referenced Types

Two of the types used in the source files that we would like to understand are defined in common/types.go. Let's look at them first.

Address is defined as an array of 20 bytes:

const (
    HashLength    = 32
    AddressLength = 20
)

// Address represents the 20 byte address of an Ethereum account.
type Address [AddressLength]byte

Hash is defined as an array of 32 bytes:

// Hash represents the 32 byte Keccak256 hash of arbitrary data.
type Hash [HashLength]byte

The opcodes for each version of the EVM are defined in jump_table.go. The operation struct defines the properties:

type operation struct {
    // execute is the operation function
    execute executionFunc
    // gasCost is the gas function and returns the gas required for execution
    gasCost gasFunc
    // validateStack validates the stack (size) for the operation
    validateStack stackValidationFunc
    // memorySize returns the memory size required for the operation
    memorySize memorySizeFunc

    halts   bool // indicates whether the operation should halt further execution
    jumps   bool // indicates whether the program counter should not increment
    writes  bool // determines whether this a state modifying operation
    valid   bool // indication whether the retrieved operation is valid and known
    reverts bool // determines whether the operation reverts state (implicitly halts)
    returns bool // determines whether the operations sets the return data content
}

Notice the jumps property, a Boolean, which if set indicates that the program counter should not increment after executing any form of jump opcode.

The destinations type maps the hash of a smart contract to a bit vector for each the smart contract's entry points. If a bit is set, that indicates the EMV's program counter should increment after executing the entry point. analysis.go defines the destinations type like this:

// destinations stores one map per contract (keyed by hash of code).
// The maps contain an entry for each location of a JUMPDEST instruction.
type destinations map[common.Hash]bitvec

contract.go

This file defines smart contract behavior.

Imports

This comment applies to all of the Go source files in the entire project. I think the following absolute import would have been better specified as a relative import:

"github.com/ethereum/go-ethereum/common"

The relative import would look like this instead:

"../../common"

If relative imports were used instead of absolute imports that point to the github repo, local changes to the project made by a developer would automatically be picked up. As currently written, absolute imports cause local changes to be ignored, in favor of the version on github. It might take a software developer a while to realize that the reason why their changes are ignored by most of the code base is because absoluate imports were used. It would then be painful to for the developer to modify the affected source files throughout the project such that they used relative imports.

Types

The publicly visible AccountRef type is defined as:

// Account references are used during EVM initialisation and
// it's primary use is to fetch addresses. Removing this object
// proves difficult because of the cached jump destinations which
// are fetched from the parent contract (i.e. the caller), which
// is a ContractRef.
type AccountRef common.Address

The same file defines a type cast from AccountRef to Address:

// Address casts AccountRef to a Address
func (ar AccountRef) Address() common.Address { return (common.Address)(ar) }

The ContractRef interface is used by the Contract struct, which we'll see in a moment. This ContractRef interface just consists of an Address.

// ContractRef is a reference to the contract's backing object
type ContractRef interface {
    Address() common.Address
}

The Contract struct defines the behavior of Ethereum smart contracts, and is central to the topic, so here it is in all its glory:

type Contract struct {
    CallerAddress common.Address
    caller    ContractRef
    self      ContractRef

    jumpdests destinations // result of JUMPDEST analysis.

    Code     []byte
    CodeHash common.Hash
    CodeAddr *common.Address
    Input    []byte

    Gas   uint64
    value *big.Int

    Args []byte

    DelegateCall bool
}

CallerAddress is a publicly visible Address of the caller. caller and self are private ContractRefs, which as we know are really just Addresses.

jumpdests, a private field, has type destinations, which as we've already discussed defines if the entry point in the smart contract that need the program counter to be incremented after executing.

Code is a a publicly visible byte slice. We don't yet know if this is the smart contract source code, compiled code, or something else.

CodeHash is the publicly visible hash of the Code, while CodeAddr is a publicly visible pointer to the Address (of the code, presumably).

Gas is the publicly visible amount of Ethereum gas allocated by the user for executing this smart contract, stored as an unsigned 64-bit integer.

Value is a private pointer to a big integer. Possibly this might be the result of executing the contract?

Args is a publicly visible byte slice, not sure what it is for.

DelegateCall is a publicly visible Boolean value, unclear if this means the smart contract was invoked using delegatecall. From the documentation: "This means that a contract can dynamically load code from a different address at runtime. Storage, current address and balance still refer to the calling contract, only the code is taken from the called address. This makes it possible to implement the “library” feature in Solidity: Reusable library code that can be applied to a contract’s storage, e.g. in order to implement a complex data structure."

contracts.go

This file is responsible for executing smart contracts on the EVM.

Imports

The following imports are used:

  • Package sha256 from the crypto project implements the SHA224 and SHA256 hash algorithms as defined in FIPS 180-4.
  • errors, the Go language simple error handling primitives, such as error.
  • math/big implements arbitrary-precision arithmetic (big numbers).
  • Other packages in this project (go-ethereum):
    "github.com/ethereum/go-ethereum/common"
    "github.com/ethereum/go-ethereum/common/math"
    "github.com/ethereum/go-ethereum/crypto"
    "github.com/ethereum/go-ethereum/crypto/bn256"
    "github.com/ethereum/go-ethereum/params"
    

    Again, I think the above imports would have been better specified as relative imports:

    "../../common"
    "../../common/math"
    "../../crypto"
    "../../crypto/bn256"
    "../../params"
  • ripemd160 implements the RIPEMD-160 hash algorithm, a secure replacement for the MD4 and MD5 hash functions. These hashes are also termed RIPE message digests.

Type PrecompiledContract

PrecompiledContract is the interface for native Go smart contracts. This interface is used by precompiled contracts, as we will see next. Contract is a struct defined in contract.go.

Pre-Compiled Contract Maps

These maps specify various types of cryptographic hashes and utility functions, accessed via their address.

PrecompiledContractsHomestead contains the default set of pre-compiled contract addresses used in the Frontier and Homestead releases of Ethereum: ecrecover, sha256hash, ripemd160hash and dataCopy.

PrecompiledContractsByzantium contains the default set of pre-compiled contract addresses used in the Byzantium Ethereum release. All of the previously defined pre-compiled contract addresses are provided in Byzantium, plus: bigModExp, bn256Add, bn256ScalarMul and bn256Pairing.

I'm not happy about the code duplication, whereby the contents of PrecompiledContractsHomestead are incorporated into PrecompiledContractsByzantium by listing the values again; this would be better expressed by referencing the values of PrecompiledContractsHomestead instead of duplicating them.

Contract Evaluator Function

The RunPrecompiledContract function runs and evaluates the output of a precompiled contract. It accepts three parameters:

  • A PrecompiledContract instance.
  • A byte array of input data.
  • A reference to a Contract, defined in contract.go, discussed above.

The function returns:

  • A byte array containing the output of the contract.
  • An error value, which could be nil.

Other Functions

  • RunPrecompiledContract – runs and evaluates the output of a precompiled contract; returns the output as a byte array and an error.
  • RequiredGas (overloaded) – Computes the gas required for input data, specified as a byte array and returns a uint64.
  • Run (overloaded) – Computes the smart contract for input data, specified as a byte array and returns the result as a left-padded byte array and an error.
  • newCurvePoint – Unmarshals a binary blob into a bn256 elliptic curve point. BN-curves are an elliptic curves suitable for cryptographic pairings that provide high security and efficiency cryptographic schemes. See the IETF paper on Barreto-Naehrig Curves for more information.