Setup your environment

As a Java developer, I tend to pick up an new subject using a language that I am familiar with. So, I have one less thing to worry about. However, for machine learning, I suggest we get out of our comfort zone and learn Python. Here are some reasons:

  • Python is popular language in data science due to the strength of its core libraries (NumPy, SciPy, pandas, matplotlib, IPython)
  • Python is one of the most used language in Google. The other one is Java.
  • TensorFlow is using Python and we will use it to scale our machine learning pipeline later.
  • I know I am going to get into deep learning and there are great support of frameworks and libraries in this space. To name you a few here, there are Genism, Tensorflow, Keras, Caffe, nolearn and more. Check this link to grab the list.

The major issues I came across when I code Python are dependency management and version compatibility. Python 3 is not fully backward compatible to Python 2. For most data science tasks, you may need quite a few of 3rd party dependencies and the last thing you want is to sort thru is to get all these different library versions fit together. To lessen your headache, I recommend that you install Anaconda that bundles with popular data science libraries plus it allows you to create your own virtual environments so python 2 and 3 libraries will not step into each other. You can follow this video to set it up. On top of that, it also comes with Jupyter Notebook (ie. IPython Notebook) that gives you an interactive environment for coding Python. You can get familiar with it from this video

Common packages/ libraries that you may need to install

  • tensorflow – Google deep learning library
  • nltk – NLP packages
  • scikit-learn – Popular machine learning library for python
  • matplotlib & seaborn – Data visualization tool
  • tabulate – Pretty print for table data
  • beautifulsoup4 – HTML parser
  • genism – High performance deep neural net package

Write Python Code on IntelliJ

Actually, you can use any text editor to write python code. But I prefer to do that in IntelliJ as I like to step thru python code in debug mode so I can examine the variables without printing it out. Below are the steps you can follow if you want to set up IntelliJ for Python.

  • Create new project (File > New > Project) and select “Python”.
  • Under Project SDK, click on the New button and point the path under “/anaconda/envs/abc/bin/python”
  • Create New Python File

After that, you can type in Python code and Run it. If you want, you can add a breakpoint and run it in debug mode.

Screen Shot 2016-12-16 at 2.50.37 AM

log in

Use demo/demo public access

reset password

Back to
log in
Choose A Format
Personality quiz
Trivia quiz