The data team at Airbnb have written an interesting article on how to manage data science research as you bring more and more people on board.
They developed an internal process and tool based on five key tenets
- Reproducibility—There should be no opportunity for code forks. The entire set of queries, transforms, visualizations, and write-up should be contained in each contribution and be up to date with the results.
- Quality—No piece of research should be shared without being reviewed for correctness and precision.
- Consumability—The results should be understandable to readers besides the author. Aesthetics should be consistent and on brand across research.
- Discoverability—Anyone should be able to find, navigate, and stay up to date on the existing set of work on a topic.
- Learning—In line with reproducibility, other researchers should be able to expand their abilities with tools and techniques from others’ work.
Their tool is built on top of Git, Jupyter notebooks and (R)Markdown.