Data is the new gold, as highlighted by an article in Deloitte. Multiple studies and articles have shown that the process taken to build and deploy a DS model can vary based on the business problem and the organizations’ stakeholders. In my experience, I follow a simple ’n’ level phase system which depends heavily on the business problem and the stakeholders. It means that iteration of any phase is possible given challenges arising from stakeholders, data and validation.
In the below list, I will highlight a macro view of the phases I follow for any DS project.
- Communicate with key stakeholders to identify the problem
- Communicating with key stakeholders to identify data mining & collection activities. Conceptualizing the solution with the blessings of the business.
- Identify IT resources required to productionize the model (IT Infrastructure, Application Developers, Business Analysts etc.)
- Data Engineering — Identifying data engineering requirements based on the problem. This can be broken down to whether the data is required to be processed in batches (consolidated results can be processed in periods relative to the business requirement i.e. end of week, month or year) or in stream processing (real-time predictions using real-time ETL jobs for real-time actions).
- Conducting the EDA using testing and training splits
- Conducting validation activities by finding the difference between Actual and Estimated values from the EDA using metrics such as but not limited to Confusion Matrix, Mean Absolute Error (MAE), Mean Squared Error (MSE), Cross Validation (CV) etc.
- Assess the model performance by identifying the scale of data required for model accuracy.
- Communicate with key stakeholders to identify model deployment mechanisms (Visualizations, Key Performance Indicators, alert systems etc.).
- Re-build the model based on model deployment mechanism.
- Support activities (regular model/data validation & re-evaluation)
The points above highlight the steps I have taken when moving conceptualized versions of DS to production. Please feel free to add your input to the mix as it will help me improve my processes as well.