Batteries Included
Every project needs an easy and well defined development process. A way for other contributors to quickly onboard. Ideally, anyone is able to clone the repo and run a couple of commands from the README and have a tangible product at their fingertips. Not every project might be able to incorporate every quality of life improvement, but every project should be able to get close. Even if a project requires shared infrastructure or an integration environment for comprehensive testing, it is still important to package as much of the process up for other developers. This not only improves the development process, but also helps to shift security concerns even further left in the development cycle.
While it is best to implement these points as soon as possible in a project, not every project is ready to be this opinionated on its structure. A good rule of thumb is, implement this no later than the proof of concept phase. These would be good last steps for the first point release.
Tests Show How the Product Works
The most important piece of a project is arguably tests. Everybody has heard the spiel about why unit tests are important and I am not about to go write a book on that, but tests provide much more. When I pick up a project, one of the first things I do is to run the test suite and see what the broader integration tests do. Tests that verify the API contracts, user interfaces, and the like define how real world users interact with the application. Rather than testing discrete components of the application, they represent the overall business requirements of the test. These might be cucumber tests or other acceptance testing. If a project does not have these, especially a project in much need of larger refactoring, the first priority for me is to setup these broader integration tests. Projects can have unit tests and the granularity is up to development preference and size of the project, but a project should have at minimum a set of tests that prove the functionality of the program as a whole. As a bonus, a project can provide a local sandbox. Docker is a good choice as a project can define a minimal set of infrastructure (i.e. database and caching systems) the application needs. Having a production-like environment locally helps explore the application more thoroughly and test in a production-like setting. Typically I will have the application’s container and a separate container with a runlist of actions to take against the application, like a Python script making REST calls. This sandbox also serves as a reference implementation for the application.
Documentation (the D Word)
Everybody dreads documentation. It takes time that can be used for tech debt and it can be pretty boring, but documentation can also help setup a project (see Readme Driven Development) and a certain set of points are especially important to document. Usually I will include a few main sections on a README for a project, a couple of pages worth, that at a glance let someone know what the project is. This approach is pretty common and we have all seen it used in various open source projects. I always ensure the following points are satisfied:
- Description: what is the project; why does it exist
- Installation: how to set the application up for production use
- Usage: general guidance on how to interact with the product, if this gets too long it gets split out to docs/ with a link to it
- Development: how to contribute to the project (TOS) and how to setup a local development environment, if this gets too long it gets split out to a separate CONTRIBUTION file with a link to it
Another relatively recent trend is to include live documentation such as Jupyter books. In addition to the points above for the README, live documentation can give developers a deeper dive into the application and serve to replace otherwise verbose training material.
SDLC and CI/CD
All projects need a well defined development lifecycle that guides developers through the process from writing code to running it in production. The scope and tools used will vary by projects, but will all share some commonalities. The first step is to decide on a git workflow, which there are many. Gitflow is a popular workflow, but is a more complex workflow. Even a simple workflow of feature branches and a main branch with tags can work. This git model will dtermine the rest of the SDLC. The next question is how features are tracked, and how are bugs and requests submitted. Most issue tracking software has methods of defining a SDLC, and may directly interact with the code. A continuous integration job should be setup for pull requests* to vet that changes conform to project contribution requirements (tests, lint, etc). If applicable, a continuous deployment job could be defined. I try to keep the repository’s continuous deployment job to packaging up the artifact, letting our production environment effectively treat the repository no differently than a third party application. This also helps enforce keeping the project from knowing about its implementation since its only implementation is the reference implementation.
* there is an evergrowing threat on public source control sites to attempt to inject malicious code (cryptominers, scripts to steal secrets) into CI pipelines via pull requests, always be aware of how injection can affect public projects