Published 2018-06-13.
Time to read: 6 minutes.
April 27, 2020 Update
This was a sample work in progress as part of a proposal to the Ethereum Foundation Grants committee. They declined to fund this activity, but gave no reason and no feedback as to what might be acceptable. The preparation of my proposal took weeks, and the preliminary feedback that I had received from many knowledgeable people was that it had great value.
I felt that the evaluation process was broken, in fact the entire organization was broken, and there was little hope that Ethereum would ever become a professional organization. Two years later, I believe history has proved me right. This was the last blockchain-related initiative I participated in.
I no longer maintain web3j-scala
,
an Ethereum-related project for Scala programmers.
I created that open source project and worked on it for free for 3 years.
It has been forked and I’ve been told it is or was used in production.
However, I see no reason to continue working for free on it.
Others can carry the project forward if they want.
This is a brief walkthrough of some of the core source files for smart contracts in the official
Go language Ethereum implementation, which includes the
geth
command-line Ethereum client program, along with many other programs.
Ethereum clients include an implementation of the Ethereum Virtual Machine (EVM),
which are able to parse and verify the Ethereum blockchain, including smart contracts,
and provides interfaces to create transactions and mine blocks.
I’ve added some suggestions for how the source code might be improved. If there is general agreement that these suggestions make sense (tell me in the comments!) then I’ll create a pull request.
License
This Ethereum client project was released under the GNU Lesser General Public License, version 3 or later, which permits use of the code as a library in proprietary programs.
Source Files
The gocloc
program counted the following source files and lines:
Language | Files | Blank Lines | Comment Lines | Code Lines |
---|---|---|---|---|
Go | 1824 | 58,134 | 81,861 | 639,435 |
C | 55 | 17,257 | 30,909 | 84,719 |
C Header | 97 | 2,559 | 6,318 | 15,083 |
Markdown | 88 | 3,152 | 0 | 9,175 |
JavaScript | 13 | 1,845 | 4,495 | 7,986 |
Assembly | 39 | 557 | 957 | 3,783 |
JSON | 17 | 0 | 0 | 2,065 |
Protocol Buffers | 2 | 113 | 40 | 1,030 |
Plain Text | 11 | 217 | 0 | 954 |
C++ | 4 | 132 | 102 | 937 |
BASH | 10 | 178 | 315 | 931 |
Perl | 10 | 268 | 1,289 | 879 |
JSX | 11 | 119 | 245 | 722 |
XML | 9 | 0 | 0 | 651 |
M4 | 4 | 79 | 99 | 649 |
YAML | 20 | 77 | 42 | 581 |
NSIS | 5 | 86 | 154 | 446 |
Java | 4 | 143 | 187 | 438 |
Makefile | 11 | 101 | 84 | 381 |
Python | 6 | 154 | 250 | 339 |
HTML | 3 | 15 | 9 | 245 |
Solidity | 7 | 56 | 171 | 213 |
Bourne Shell | 6 | 23 | 25 | 119 |
CMake | 1 | 9 | 0 | 35 |
Awk | 1 | 4 | 4 | 17 |
TOML | 1 | 0 | 0 | 3 |
Total | 2,260 | 85,278 | 127,556 | 771,825 |
Packages
I used the following incantation to discover that geth
defines 244 packages:
$ grep -rh "^package" | grep -v "not installed" | \
tr -d ';' | sed 's^//.*^^' | awk '{$1=$1};1' | \
sort | uniq | wc -l
I won't list them all.
The godoc
for the project contains much of the
following documentation
for the top-level packages.
I provided the rest of the information from disparate sources, including reading the source code:
accounts | implements high-level Ethereum account management. | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trie
| provides a binary Merkle tree implementation. | ||||||||||||||||||||||||
cmd | Contains the following command-line tools. Most tools support the --help option.
| ||||||||||||||||||||||||
common | contains various helper functions worth checking out | ||||||||||||||||||||||||
consensus | implements different Ethereum consensus engines
(which must conform to the Engine interface):
clique
implements proof-of-authority consensus, and
ethash implements proof-of-work consensus.
| ||||||||||||||||||||||||
console |
Ethereum implements a JavaScript runtime environment (JSRE) that can be used in either interactive (console) or non-interactive (script) mode.
Ethereum's JavaScript console exposes the full web3 JavaScript Dapp API and the admin API.
More documentation is here.
This package implements JSRE for the geth console and geth console subcommands.
| ||||||||||||||||||||||||
containers | |||||||||||||||||||||||||
contracts | |||||||||||||||||||||||||
core | implements the Ethereum consensus protocol, implements the Ethereum Virtual Machine, and other miscellaneous important bits | ||||||||||||||||||||||||
crypto | cryptographic implementations | ||||||||||||||||||||||||
dashboard | |||||||||||||||||||||||||
eth | implements the Ethereum protocol | ||||||||||||||||||||||||
ethclient | provides a client for the Ethereum RPC API | ||||||||||||||||||||||||
ethdb | |||||||||||||||||||||||||
ethstats | implements the network stats reporting service | ||||||||||||||||||||||||
event | deals with subscriptions to real-time events | ||||||||||||||||||||||||
internal | Debugging support, JavaScript dependencies, testing support | ||||||||||||||||||||||||
les | implements the Light Ethereum Subprotocol | ||||||||||||||||||||||||
light | implements on-demand retrieval capable state and chain objects for the Ethereum Light Client | ||||||||||||||||||||||||
log | provides an opinionated, simple toolkit for best-practice logging that is both human and machine readable | ||||||||||||||||||||||||
metrics | port of Coda Hale's Metrics library. Unclear why this was not implemented as a separate library, like this one. | ||||||||||||||||||||||||
miner | implements Ethereum block creation and mining | ||||||||||||||||||||||||
mobile | contains the simplified mobile APIs to go-ethereum | ||||||||||||||||||||||||
node | sets up multi-protocol Ethereum nodes | ||||||||||||||||||||||||
p2p | implements the Ethereum p2p network protocols: Node Discovery Protocol, RLPx v5 Topic Discovery Protocol, Ethereum Node Records as defined in EIP-778, common network port mapping protocols, and p2p network simulation. | ||||||||||||||||||||||||
params | |||||||||||||||||||||||||
rlp | implements the RLP serialization format | ||||||||||||||||||||||||
rpc | provides access to the exported methods of an object across a network or other I/O connection | ||||||||||||||||||||||||
signer | |||||||||||||||||||||||||
swarm | |||||||||||||||||||||||||
tests | implements execution of Ethereum JSON tests | ||||||||||||||||||||||||
trie | implements Merkle Patricia Tries | ||||||||||||||||||||||||
vendor | contains a minimal framework for creating and organizing command line Go applications, and a rich testing extension for Go's testing package | ||||||||||||||||||||||||
whisper | implements the Whisper protocol |
I used the following incantation to list the package names:
find . -maxdepth 1 -type d | sed 's^\./^^' | sed '/\..*/d'
The build/
directory does not contain a Go source package; instead,
it contains scripts and configurations for building the package in various environments.
Smart Contract Source Code
The core/vm
directory contains the files that implement the EVM.
These files are part of the vm
package.
Let's look at two of them:
-
contract.go
, which defines smart contract behavior. -
contracts.go
, responsible for executing smart contracts on the EVM.
Referenced Types
Two of the types used in the source files that we would like to understand are defined in
common/types.go
.
Let's look at them first.
Address
is defined as an array of 20 byte
s:
const ( HashLength = 32 AddressLength = 20 ) // Address represents the 20 byte address of an Ethereum account. type Address [AddressLength]byte
Hash
is defined as an array of 32 byte
s:
// Hash represents the 32 byte Keccak256 hash of arbitrary data. type Hash [HashLength]byte
The opcodes for each version of the EVM are defined in
jump_table.go
.
The operation
struct
defines the properties:
type operation struct { // execute is the operation function execute executionFunc // gasCost is the gas function and returns the gas required for execution gasCost gasFunc // validateStack validates the stack (size) for the operation validateStack stackValidationFunc // memorySize returns the memory size required for the operation memorySize memorySizeFunc halts bool // indicates whether the operation should halt further execution jumps bool // indicates whether the program counter should not increment writes bool // determines whether this a state modifying operation valid bool // indication whether the retrieved operation is valid and known reverts bool // determines whether the operation reverts state (implicitly halts) returns bool // determines whether the operations sets the return data content }
Notice the jumps
property, a Boolean,
which if set indicates that the program counter should not increment after executing any form of jump opcode.
The destinations
type maps the hash of a smart contract to a bit vector
for each the smart contract's entry points.
If a bit is set, that indicates the EMV's program counter should increment after executing the entry point.
analysis.go
defines the destinations
type like this:
// destinations stores one map per contract (keyed by hash of code). // The maps contain an entry for each location of a JUMPDEST instruction. type destinations map[common.Hash]bitvec
contract.go
This file defines smart contract behavior.
Imports
This comment applies to all of the Go source files in the entire project. I think the following absolute import would have been better specified as a relative import:
"github.com/ethereum/go-ethereum/common"
The relative import would look like this instead:
"../../common"
If relative imports were used instead of absolute imports that point to the github repository, local changes to the project made by a developer would automatically be picked up. As currently written, absolute imports cause local changes to be ignored, in favor of the version on github. It might take a software developer a while to realize that the reason why their changes are ignored by most of the code base is because absoluate imports were used. It would then be painful to for the developer to modify the affected source files throughout the project such that they used relative imports.
Types
The publicly visible
AccountRef
type is defined as:
// Account references are used during EVM initialisation and // it's primary use is to fetch addresses. Removing this object // proves difficult because of the cached jump destinations which // are fetched from the parent contract (i.e. the caller), which // is a ContractRef. type AccountRef common.Address
The same file defines a type cast from AccountRef
to Address
:
// Address casts AccountRef to a Address func (ar AccountRef) Address() common.Address { return (common.Address)(ar) }
The ContractRef
interface is used by the Contract
struct
,
which we'll see in a moment. This ContractRef
interface just consists of an Address
.
// ContractRef is a reference to the contract's backing object type ContractRef interface { Address() common.Address }
The Contract
struct defines the behavior of Ethereum smart contracts, and is central to the topic, so here it is in all its glory:
type Contract struct { CallerAddress common.Address caller ContractRef self ContractRef jumpdests destinations // result of JUMPDEST analysis. Code []byte CodeHash common.Hash CodeAddr *common.Address Input []byte Gas uint64 value *big.Int Args []byte DelegateCall bool }
CallerAddress
is a publicly visible Address
of the caller.
caller
and self
are private ContractRef
s, which as we know are really just Address
es.
jumpdests
, a private field, has type destinations
,
which as we've already discussed defines if the entry point in the smart contract that need the program counter to be incremented after executing.
Code
is a a publicly visible byte
slice.
We don't yet know if this is the smart contract source code, compiled code, or something else.
CodeHash
is the publicly visible hash of the Code
,
while CodeAddr
is a publicly visible pointer to the Address
(of the code, presumably).
Gas
is the publicly visible amount of Ethereum gas allocated by the user for executing this smart contract, stored as an unsigned 64-bit integer.
Value
is a private pointer to a big integer.
Possibly this might be the result of executing the contract?
Args
is a publicly visible byte
slice, not sure what it is for.
DelegateCall
is a publicly visible Boolean value,
unclear if this means the smart contract was invoked using
delegatecall
.
From the documentation: "This means that a contract can dynamically load code from a different address at runtime. Storage, current address and balance still refer to the calling contract, only the code is taken from the called address.
This makes it possible to implement the “library” feature in Solidity: Reusable library code that can be applied to a contract’s storage, e.g. in order to implement a complex data structure."
contracts.go
This file is responsible for executing smart contracts on the EVM.
Imports
The following imports are used:
-
Package
sha256
from thecrypto
project implements the SHA224 and SHA256 hash algorithms as defined in FIPS 180-4. -
errors
, the Go language simple error handling primitives, such aserror
. -
math/big
implements arbitrary-precision arithmetic (big numbers). -
Other packages in this project (
go-ethereum
):Shell"github.com/ethereum/go-ethereum/common" "github.com/ethereum/go-ethereum/common/math" "github.com/ethereum/go-ethereum/crypto" "github.com/ethereum/go-ethereum/crypto/bn256" "github.com/ethereum/go-ethereum/params"
Again, I think the above imports would have been better specified as relative imports:
Shell"../../common" "../../common/math" "../../crypto" "../../crypto/bn256" "../../params"
-
ripemd160
implements the RIPEMD-160 hash algorithm, a secure replacement for the MD4 and MD5 hash functions. These hashes are also termed RIPE message digests.
Type PrecompiledContract
PrecompiledContract
is the interface for native Go smart contracts.
This interface is used by precompiled contracts, as we will see next.
Contract
is a struct
defined in
contract.go
.
Pre-Compiled Contract Maps
These maps specify various types of cryptographic hashes and utility functions, accessed via their address.
PrecompiledContractsHomestead
contains the default set of pre-compiled contract addresses used in the Frontier and Homestead releases of Ethereum:
ecrecover
, sha256hash
, ripemd160hash
and dataCopy
.
PrecompiledContractsByzantium
contains the default set of pre-compiled contract addresses used in the Byzantium Ethereum release.
All of the previously defined pre-compiled contract addresses are provided in Byzantium, plus: bigModExp
, bn256Add
,
bn256ScalarMul
and bn256Pairing
.
I’m not happy about the code duplication, whereby the contents of PrecompiledContractsHomestead
are incorporated into PrecompiledContractsByzantium
by listing the values again;
this would be better expressed by referencing the values of PrecompiledContractsHomestead
instead of duplicating them.
Contract Evaluator Function
The RunPrecompiledContract
function runs and evaluates the output of a precompiled contract.
It accepts three parameters:
-
A
PrecompiledContract
instance. - A byte array of input data.
-
A reference to a
Contract
, defined incontract.go
, discussed above.
The function returns:
- A byte array containing the output of the contract.
-
An
error
value, which could benil
.
Other Functions
-
RunPrecompiledContract
– runs and evaluates the output of a precompiled contract; returns the output as a byte array and anerror
. -
RequiredGas
(overloaded) – Computes the gas required for input data, specified as a byte array and returns auint64
. -
Run
(overloaded) – Computes the smart contract for input data, specified as a byte array and returns the result as a left-padded byte array and anerror
. -
newCurvePoint
– Unmarshals a binary blob into a bn256 elliptic curve point. BN-curves are an elliptic curves suitable for cryptographic pairings that provide high security and efficiency cryptographic schemes. See the IETF paper on Barreto-Naehrig Curves for more information.