Testing Smart Contracts: Recommendations and Security¶

The immutable nature of smart contracts underscores the reality that, just like the contracts themselves, their bugs are equally immutable. As such, testing emerges as the a central strategy for identifying system glitches. The testing process is crucial in verifying the expected behavior of the smart contract system before deployment. This section aims to offer guidelines and best practices for testing Solidity and Vyper smart contracts. While it's well-established why testing is essential, this text presumes that the reader, being a reasonable adult, already appreciates this importance. Therefore, we will not delve into the "why" of testing, instead focusing on the "how."

Preparation¶

Preparation stands as a critical element in any form of testing. For effective testing of smart contracts, it is crucial to articulate clear testing objectives. These could revolve around various areas, such as system correctness, accounting-related consistency, gas efficiency, or even examining specific user flows.

It's also essential to clearly define the use cases. Typically, test suites are organized based on these use cases, which describe the interactions and communication between the system and a user to achieve a specific goal. As they simulate potential user-system interaction, use cases offer an excellent avenue for uncovering defects that users are likely to encounter in real-world scenarios.

To streamline the testing process, developing a test plan and detailed documentation describing the scope, approach, resources, and schedule of intended test activities is recommended. This plan acts as a blueprint for conducting software product testing. It helps determine the effort required to validate the quality of the software under test. The test plan is a crucial strategy in software testing.

Once you've outlined the test plan, it can be incorporated into a test suite. This combination simplifies continuous integration tools, which can automatically execute the test suite whenever the code base changes.

Finally, it's important to remember that testing will inevitably uncover bugs. As such, from a project management perspective, sufficient time should be set aside for addressing these bugs as they arise from test suite failures. This preparedness ensures the testing process does not stall, promoting a more effective and efficient project workflow.

Test Types¶

Software testing is a discipline that extends beyond the blockchain realm. Therefore, a plethora of inspiration can be drawn from traditional software engineering and quality assurance procedures. Every system should incorporate a comprehensive unit testing strategy and end-to-end tests. For systems that interact with third-party smart contracts, integration tests serve as an efficient method of covering unknown code.

Unit tests primarily focus on individual functions and components. Every function operates based on certain assumptions about the parameters it receives and the system state it interacts with. Unit tests should thus encompass all possible scenarios, both valid and invalid, in which a function can be called to uncover logic errors. As the first layer of the test suite, unit tests provide a critical validation process for the basic building blocks upon which the system builds more complex behavior.

Integration tests form the next layer of testing, building upon unit tests. Once the functionality of individual components has been confirmed through unit tests, the integration of these components must be checked. This level of testing focuses on the interactions between components, whether internal or external, to the system. For instance, if a component makes an external call to a third-party system, integration tests can verify if the system behaves correctly, if the external function call returns an invalid value, or if the call is reverted. This is especially useful when the system integrates with an upgradeable third-party system that may not always perform as expected. Given that these relations are not tested during unit testing, integration tests are considered an absolute necessity.

End-to-end or system testing takes a holistic approach to testing the system, ensuring that all basic components function correctly once integrated. Many projects use this strategy to swiftly enhance their test coverage and confirm that the system works from the start. User journeys should be defined to structure these tests, including successful and invalid paths. Portions of the test suite can also serve as examples for developers to integrate correctly with the system. Before deploying the system, end-to-end tests should be defined and executed successfully to ensure that all valid user interactions work correctly and all invalid user interactions are handled as required.

Fuzz testing, or fuzzing, is an automated software testing method that injects invalid, malformed, or unexpected inputs into a system. This technique is incredibly powerful in exposing security vulnerabilities. Specialized fuzzing tools are required, the specifics of which depend on the technology. Fortunately, the Ethereum Virtual Machine (EVM) is less complex than architectures like x86-64, making specialized fuzzers highly effective when working with larger code bases. However, increased code complexity can result in a combinatorial explosion, making it less likely for fuzzers to yield valid results. While fuzz testing cannot provide absolute guarantees, it has proven to be a solid addition to existing tests. An added advantage is that fuzzers can be run indefinitely as they generate their own test cases and report back when test failure conditions are met. This feature is why fuzz tests are often run continuously on specific versions of the code, whether already deployed or on a frozen version of the code, while development continues elsewhere in the code base.

Test-Driven Development¶

Test-Driven Development (TDD) is a methodology where tests are written before the actual code. This approach attempts to ensure that all code aligns with the specifications enforced by the tests. The workflow involves writing tests, verifying that they are failing, writing the code, and confirming that the tests pass. It is important to note that the code making the test pass should be the minimal amount required, thus preventing any superfluous code from infiltrating the project. If the tests fail, it's because either the tests or the code contain a bug.

Consequently, the process is repeated after a round of refactoring the code and tests. At the end of a TDD round, it's common to refactor the overall code base again and structure it more neatly, for example, by externalizing code into libraries or individual components and rerunning the test suite to assert that no bugs have been introduced. While TDD cannot guarantee bug-free code, it usually provides stronger assurances than retrofitting a test suite. It also streamlines the test writing process since developers often write tests at the end of the development process, which can lead to fatigue and a lack of coverage in test scenarios. TDD is primarily applied to unit tests but can also be utilized for integration testing, assuming that the components are well-defined and unlikely to change. Even though it's often perceived as a drawback, TDD compels developers and project managers to first ponder the architecture and design of their smart contracts and lay out a set of user requirements the system needs to fulfill. However, it's worth noting that all the advantages of TDD come at the cost of development speed, especially when the developers are not yet well-versed in test-driven development.

Tools¶

Smart contracts certainly require specialized test runners. Most development tools come with their own test runners, including notable projects like:

Hardhat, a framework for testing smart contracts based on ethers.js, Mocha, and Chai,
Remix Tests, working underneath the Remix IDE "Solidity Unit Testing" plugin, which is used to write and run test cases for a contract,
Brownie, which uses Pytest, a feature-rich test framework that allows you to write small tests with minimal code, scales well for large projects, and is highly extendable,
Foundry provides forge with a fast and flexible Ethereum testing framework capable of executing simple unit tests, gas optimization checks, contract fuzzing, and
ApeWorx, which is a Python-based development and testing framework.

Each of these tools offers a host of other features that extend beyond the scope of this document. Readers are encouraged to explore each tool and experiment with them to identify which best aligns with their development flow. Despite debates about which tool is superior, as long as it is well-maintained and the tool developers are receptive to their community's feedback, any tool will suffice.

Mainnet Forking¶

Mainnet forking is a technique in blockchain testing where a specific block number and a reference to a blockchain node (like Geth) are taken, and all relevant state information up to that block number is copied. This cloned state then allows developers to run tests against it. This strategy is most commonly implemented against Mainnet, particularly when testing interactions with third-party code that is already deployed.

One of the significant advantages of mainnet forking is the ability to run all tests locally, significantly reducing both the risks and costs associated with testing. Since the local network mirrors the on-chain state, it can also be reverted back to its original state after each test, making for efficient and isolated tests. Moreover, the local state can be manipulated as necessary to simulate specific scenarios that might not be present in the forked chain's state. Mainnet forking is thus a potent tool, especially for testing scenarios that aren't easily covered by existing unit and integration tests.

Test Structure¶

The readability of the test suite is just as crucial as that of the main codebase. This is particularly important in maintaining an overview of the test suite, where the organization and file structure play a crucial role. All test-related logic should be encapsulated in a "tests" directory located in the project's root directory. This helps avoid mixing test files with the business logic of the code base, thus keeping it clean and organized.

Tests should be subdivided into unit, integration, and end-to-end tests, each in its own directories. Each of these directories can then host different utility and configuration scripts tailored to the specific setup requirements of each type of test.

Within each test directory, every area of concern should have its dedicated test file. For instance, in the case of unit tests, there should ideally be one file per unit. For integration tests, each relationship between components warrants its own test file.

Each test file's name should be meaningful, providing insight into the part of the system that the tests within it cover. As the system grows more complex, there's a risk that individual tests may become bloated. To avoid this, code duplication between tests should be minimized. Instead, shared code should be externalized into utility functions or test fixtures, keeping the test logic lean, expressive, and maintainable.