Interview with Data science expert Kai Xin Thia, Data scientist at Lazada, Co-Founder DataScience SG
Interview with Kai Xin Thia
We are excited to interview Kai Xin Thia as the first data scientist for our dataaspirant blog lovers. He has shared some interesting things about data science. So let us see what he has shared with us.
Hi Kai Xin Thia we are so delighted to interview you and thanks a lot for your time with us. Before going to interview let me introduce Kai Xin Thia.
Kai Xin is a data scientist at Lazada. He specializes in behavioral analytics and has interest in large recommendation systems. He has been building behavioral models for 3 years and is the top 1% on Kaggle, which is an international data science competition portal. He is also the Co-Founder of DataScience SG (the largest data science community in Singapore) & volunteer at DataKind SG (NGO that helps other NGOs through data science).
Hi Thiakx! Let me start with asking about your background. Can you tell your background for our Data science Enthusiasts?
Hi, I began my data journey from singapore management university, where I graduated with a degree in information systems, business intelligence and analytics. I then spent time working at SAS and EMC, building my foundation. I moved on to focus on healthcare analytics at Khoo Teck Puat hospital and I am currently at Lazada, working on retail and behavioral analytics.
That’s great now we came to know about your Healthcare analytics. What is your definition of data science ?
Sure Generally, data science is the use of hacking skills, math & stats and domain expertise to generate useful insights for business and you can see some reference like.
I am personally excited to know about you. How did you get started with data science and which things inspired you a lot towards data science ?
When I first got started, it was all about business intelligence & business analytics. Pretty much about generating reports to understand the current performances of businesses. Things started to get interesting when I started on Kaggle, building predictive models based on historical data.
Can you share your experience about data science? ( At specially regarding your projects and start up “Foxhole”)
I will say there is a growing interest among companies in Singapore (and probably Asia in general) regarding the use of data science in their operations but we are still behind our US counterparts.
You have been building behavioral models for 3 years. Can you give us introduction and insights about behavioral models?
Behavioral models as its namesake suggest, is about understanding why people behave / respond in a certain way and how we can encourage them to adjust their behavior using data models. Beyond data models, there are a lot to learn in this field, for example, how predictably irrational most people are: http://www.amazon.com/Predictably-Irrational-Revised-Expanded-Edition/dp/0061353248
You had participated in data science competitions. You were 2nd place winner in “Unilever Prediction Challenge on consumer preference” and “Singapore’s Data in the City Visualization Challenge on education “. Can you share your experience about those?
Data in the City was interesting as we took the chance to research and understand Singapore’s education journey and we grew from a third world, improvised country into a developed city with an education system that attracts students from all around the region. In the Unilever challenge, we had the opportunity to present to management and learn from them what truly matters: sometimes it is not just about building the most complex models but rather, the act of balancing model accuracy with the ease of deploying the models into production.
You have done information systems from Singapore Management University. How has been information systems helping you in your career. What would be your recommendation for Data science enthusiasts regarding this?
University is the best time to pick up technical skills. If you are interested to try out / enter the data science industry, don’t be afraid to sign up for some difficult mathematics / statistics / machine learning modules. Use this opportunity to make mistakes and learn from them.
What is your opinion about online courses for Data science? Which are your recommended online courses for Data science enthusiasts?
Coursera / edX / stanford online are fantastic platforms for learning. Here is what I recommend:
- Intro: https://www.edx.org/course/analytics-edge-mitx-15-071x-0
- Foundation skills of a data scientist:
- Statistical-learning: https://lagunita.stanford.edu/courses/HumanitiesandScience/StatLearning/Winter2015/about
- Machine learning: https://www.coursera.org/learn/machine-learning/outline
- Mining massive data: https://www.coursera.org/course/mmds
- Probabilistic Graphical Models (difficult course): https://www.coursera.org/course/pgm
- Introduction to Recommender Systems: https://www.coursera.org/learn/recommender-systems
- Social Network Analysis: https://www.coursera.org/course/sna
*John Hopkin’s data science specialization is not worth the money but is alright for a quick introduction to data science.
Can you share your favorite list of data science books for us?
- An Introduction to Statistical Learning (free): http://www-bcf.usc.edu/~gareth/ISL/
- Datascience handbook: http://www.thedatasciencehandbook.com/
- Machine Learning for Hackers: http://shop.oreilly.com/product/0636920018483.do
- Information Dashboard Design: http://www.amazon.com/Information-Dashboard-Design-At-Glance/dp/1938377001
- Visualize this: http://www.amazon.com/Visualize-This-FlowingData-Visualization-Statistics/dp/0470944889
- learning spark: http://shop.oreilly.com/product/0636920028512.do
- advance analytics with spark: http://shop.oreilly.com/product/0636920035091.do
- My datascience trello board: https://trello.com/b/rbpEfMld/data-science
What are the prerequisites that you think for a data science fresher who is starting from Zero level?
Not giving up. Most smart, rational peeps give up after 3-6 months because it is too hard / too boring / not earning them money. It takes years to train a doctor; it takes at least as long to train a data scientist.
What are the best programming languages for data science and which one is your favorite?
Learn R for quick prototyping, python to deal with larger datasets and Apache Spark for enterprise level work.
What are the primary questions that will ask in data scientist interviews?
See the list from Quora, quite accurate.
I will like to add one question that I was asked before: “Describe to me the greatest data project that you have worked on so far.
What is the present scope of data science and how it would be in future?
Current popular data science tools (R/python) are limited to single machines while enterprise software tools (SAS / Teradata) are expensive and unwieldy. Next generation tools like Spark will bridge the gap, bringing enterprise level scalability to popular data science tools (R/python).
Final question. Can you share your opinion on our Blog?
It will be really interesting if you can interview more data scientists 🙂
Sure. We have more interviews coming up 🙂
Thank you so much for enlightening interview with us. This will definitely add value to our readers. Once again thank you.
I hope you liked today interview. If you have any questions then feel free to comment. If you want to ask a question to data scientist then let us know in the comments. You can find link for comments just below the title. so that we ask those questions in next data scientist interviews.