A well-organized project structure will ensure
- Reproducibility - The results of your project should be retrievable
- Readability - The project report must be easy for decision makers to interpret
- Consistency - The structure should not have surprise elements that do not make coherent sense when they come together
- Easy collaboration - Multiple team members will be able to collaborate effectively together if a fixed design for the structure is decided on before starting the project
To help you with having such a structure while taking part in the Datathon, here are a few links. You could choose whichever seems most comfortable and aligned with your goals.
How To Organize Your Project: Best Practices for Open Reproducible Science
Cookiecutter Data Science
How to Start a Data Science Project in Python - GoDataDriven
No matter what structure you choose, keep the following points in your mind
- Do not mutate raw data; always only modify duplicate copies
- Run Jupyter notebooks from top to bottom before submitting to ensure that all cells run without errors
- Use comments where necessary to help evaluators understand the purpose of your code
- Write docstrings for user-defined functions to help establish their usage
- Maintain separate notebooks for each phase of the data science process as this will help maintain shorter notebooks, thus reducing the risk of passing on erroneous code
- Use relative paths to files in your project structure
- Use a version control system such as Git in order to keep track of all work being performed