Data Scientist: Milestone Checkpoint

Datascience Word Cloud

Dilbert Comic: Self Mentoring I was having a discussion with myself today while ignoring a boring part of a podcast.  I had just finished my Capstone report for my Udacity Machine Learning Nano Degree (which you should read if you’re interested in predicting cyber security exploits from vulnerabilities), and was trying to decide what course of action to peruse next.  For me, the realm of Data Analysis, Machine Learning, Statistics, Data Science (all those buzz words) is a new field. I’ve spent 12 years doing mostly embedded firmware engineering.  While some of the Software Engineering and Software Development Lifecycle concepts overlap, the content and technical experience is very different.  I find myself continuously eager to learn the next thing, take the next course, start working through the next textbook.

P/PC

7 Habits of Highly Effective People Book For those of you who’ve read the 7 Habits of Highly Effective People, this is called Production Capacity.  My sole focus lately has been on increasing ‘what’ I know in this new field.  After 12 years, there was little that surprised me in the embedded space, I had my 10,000 hours in and then some.

I lived and breathed the field and my knowledge was both deep and wide.  I want that same level of knowledge in this new space.  Then it dawned on me that it’s going to take 12 years to have the same level of expertise, at least from a domain level.  I’ve been focusing mostly on my knowledge, my Production Capacity, but not on my experience, my Production.

XKCD Comic on PointersI find myself offering Software Engineering/Programming advice to people entering the field. Every single time my advice to them is to just get out there and program.  Find a stupid project and just write some code for it.  Follow tutorials, do a tutorial for one language in another language.  My advice to others has always been to increase Production. Much like when building a race car motor, build for torque and the horsepower will come.  It’s often the case we don’t follow our own advice.

There to Here

Dr. Suse Here to There Image I started down this path April 2017 with Andrew NG’s course (wow, has it really only been 17 months??).  I had probably listened to a bunch of podcasts before that and did some reading, but that was my introductory course.  Since then I’ve completed Udacity’s Artificial Intelligence Nano Degree and Udacity’s Machine Learning Nano Degree, in addition to lots of different descriptive and inferential statistics courses.  I have probably a dozen more that I want to take, in addition to books to read.  But all of these courses just focus on my Production Capacity and don’t further my experience level.

I’m not sure if it would have made much sense to start exploring different data sets before now, or if now is the right time.  Either way, at this checkpoint my personal goal is to explore a dataset a week and publish the results.  They will be awful, I’ll make mistakes, and will overlook many obvious things. However much like someone writing code for the first time, you need a lot of seg-faults before you realize that your ‘c’ strings need a ‘\0’ at the end.  I’ll start with the end in mind and take my own advice.  There will always be more to learn, but it’s time to start putting some of it to practice.  The final capstone project allowed me to go through the full process of scoping a problem, finding and extracting data, and coming up with a solution.  It was truly an empowering experience that I intent to push forward with.  The only piece of the puzzle not completed during the capstone was putting the model into production.  This part will need to wait, perhaps on a different data source.