My favourite stage at the DS process is thinking about what data would be ideal to solve a particular problem.
This is an abstract and theoretical moment when you can be as creative as you want. Forget about data silos, infrastructure problems or data quality issues. It is a wishlist. You can order whatever you want.
I like doing this process in isolation, just letting my mind wander and flow. In my opinion, this is probably one of the most creative and fun parts of the project because you have no constraints, except your own knowledge about your sector or the process at stake. And that’s is also another reason that this is a fabulous moment as it allows you to test your own knowledge.
But at some point I will get stuck and that’s when I go into the next step of this early stage of a project. Truth is that most of our work is optimization, in many cases we aim to make some processes more automatic so it is likely that there is someone who has been working in that task before you arrived to the organization. So at this point you go and talk to people, sometimes people you have never heard of if you are in a big company. And it is also a quite remarkable moment in the project, because you will be learning more about your domain and you start gazing the purpose of your work: make someone’s life a bit better so that the business machine keeps oiled and running.
Don’t fool yourself, talking to people and trying to extract information of how they work usually takes way longer than you thought. If you are facing a complicated project you should be starting to think how to make this new long term relationship as successful as possible. And probably is by keeping them informed about your advances and frustrations.
Before going deeper and start reading scientific papers and blog posts by companies who have faced similar challenges, I like chatting to my colleagues. At this point you are usually quite confident about what the ideal data looks like, as you have devoted some hours of your time to think about the problem. However, every time that I do this I find that my colleague points to an hole in my ideal dataset and proposes something new and smart.
Now read papers, blogs and so on with a critical mindselt. Can you spot the excellence in the data they are using? Can you spot the data challenges they might be facing? Try to read between the lines in that tech giant post that brag about how good they are solving the challenge you are facing.
The final step would be to create a matrix that allows us to map potential and effort required of all those pieces of data in your wishlist.