Artificial intelligence is a hot topic now. With the growth of accumulated information, the demand for specialists with skills to handle this amount of data is growing. And the question is not limited to the selection of the right tools. The specialist must be well oriented both in technical terms and in the business whose data he processes. In turn, the technical aspect is not limited to the knowledge of only one language. Often this implies knowledge of several languages, the ability to build a system in a heterogeneous environment, as well as knowledge of advanced mathematics and applied statistics.
The entrance threshold to the specialty is quite high. It’s required not only to accumulate knowledge, but also to learn to understand the data and correctly apply the tools. There is a growing number of Massive Open Online Course (MOOC) which can help with the start and further in this article you’ll find a list of them which you can use to get started. In order to make this list relevant. I took reviews and feedback of participants from several Data Science communities. But before diving deeper with courses let’s recap what is Data Science and which skills you will need.
Data Science Specialist Profile
Data Science is a very broad topic. When I first met with this discipline, I remember my confusion because of the number of necessary skills that I had to learn. But if you move with small steps everything is possible. The entire discipline can be divided into two major specializations: Data Scientist and Data Engineer. There are many articles about their differences and skills. For example, this one O’Reilly Data engineers vs. data scientists. In my article, I will only summarize the final view. Most likely you’re questioning yourself which language you have to learn Python or R.
My personal opinion is that if you want to move towards the Date Scientist specialty you should learn R, and if vice versa, then Python. Since I have a background of backend developer I tend to choose Python and the further recommendations will be focused more on Data Engineer specialization. Bear in mind that this discipline is a cross-road of different practices and brings everything together. It means that if we’re talking about database development it may include not only relational databases, but also NoSQL, and also different vendors. Or it maybe cloud storage or some files stored on the disk. The person should have enough experience to understand which tool and in when to use.
Common for both specializations is Data Science concepts could be as follows:
- Learning types: supervised, unsupervised, reinforce
- Basics of Neural Networks
- Basics of Application Statistic and Advanced Mathematics
- Understanding tasks which could be solved:
Data Engineer should dig into the following topics:
- Programming languages such as Python, R
- basics and concepts
- ML/DL APIs: Pandas, Scikit-learn, Tensorflow, etc.
- The building of ETL pipelines: Hadoop, Spark, Map-Reduce, etc.
- Cloud services and Data warehouse architecture
- Distributed and service-oriented architecture
- Deeper knowledge of Neural Networks
Data Scientist should dig into the following topics:
- Flawless understanding of Application Statistic and Advanced Mathematics
- Flawless understanding of Regression, Classification, Clusterization
- Understanding of DL models development (RNN, Dense, Convolutional)
- Fluent in DL/ML (Statistics, Fraud/Anomaly detection, user segmentation, recommendation systems, A/B testing) and Data Analytics
Distinguish Two Roles
And last but not least. Don’t be mislead with this picture which is used often all over the internet. You as a future specialist should teach your managers to distinguish these two roles and do not expect from one person to know all aspects of the discipline. The right skill set and smart application of it is the key factor to get benefits from this practice.
Keeping everything above in mind, I and my colleagues from Data Science Society created this list for those of you how is just starting with this discipline. We’ll try to update the list from time to time adding more relevant courses for both specializations. We will appreciate your feedback in the comments section about the courses which you participated in.
All courses are divided into several areas:
- Programming Languages
- Mathematics and Statistics
- Machine Learning
- Deep Learning and Neural Networks
- Data Engineer Concepts
I chose only a few courses in each of the areas which have good feedback from communities.
|DAT208x: Introduction to Python for Data Science||Python, Numpy, Pandas, Matlibplot||1||100||This course has everything you need in the beginning. It covers Numpy, Pandas, visualization with Matlibplot and Python language. The materials are very good and presented in good ways. It is done in collaboration with DataCamp and you can learn new things in action in their IPython shell.|
|DSE200x: Python for Data Science||Python, jupyter, pandas, numpy, matplotlib||2||350||This course is a part of MicroMasters program Data Science. It covers important packages and techniques to work with data in Python and also introduces you to ML concepts.|
|6.00.1x: Introduction to Computer Science and Programming Using Python||Python and algorithms||2||50||This is a course for beginners. You’ll learn Python, algorithm and data structures.|
|Applied Data Science with Python Specialization||Python with packages for plotting and data analysis||5||250||The specialization consists of 5 courses. It gives the possibility to learn packages and techniques which are the standard in the industry. They all covers such aspects as statistical, machine learning, visualization, text analysis, social network analysis, Python toolkits such as pandas, matplotlib, scikit-learn, nltk, and networkx.|
Mathematics and Statistics
|Probability – The Science of Uncertainty and Data||Probabilistic models, inference methods, laws of large numbers||4||300||The course covers all of the basic probability concepts, such as multiple discrete or continuous random variables, expectations, and conditional distributions, laws of large numbers, the main tools of Bayesian inference methods, an introduction to random processes (Poisson processes and Markov chains).|
|Matrix Algebra for Engineers||Matrices, vector spaces and linear algebra||1||50||This course covers matrices, linear algebra and vector operations.|
|Machine Learning||supervised, unsupervised learning, bias/variance theory, neural networks||3||80||The course is conducted by Andrew Ng, who is a well-known person in Data Science. The course covers such aspects as supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning), bias/variance theory. This just a very good course to get started with Data Science.|
|Advanced Machine Learning Specialization||NLP, reinforcement, deep learning, bayesian methods||7||350||This specialization is built in collaboration with Yandex School of Data Analysis. Seven courses will draw you through Kaggle challengers, text analysis, computer vision, and more.|
|6.00.2x: Introduction to Computational Thinking and Data Science||Python, probability, distributions, Monte Carlo simulations||3||50||This is purely for people who needs to upgrade their skills in computation and data exploration. It’s good that Python is used in the labs. By reviews, it gives information for beginners in a good and clean way. It worth to take after another MITx course 6.00.1x|
|Learning From Data||Overfitting, Regularization, Validation, Kernel Methods||2||free||This course in machine learning will cover basic theory, algorithms, and applications. Machine learning is a key technology in Big Data, and in many financial, medical, commercial, and scientific applications.|
Deep Learning and Neural Networks
|Deep Learning Specialization||Neural networks, Convolutional networks, RNNs, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization||5||250||Andrew Ng with his colleagues conducted this course to get a deeper introduction to Deep Learning and Neural Networks. Five courses of the specialization will draw you through theory and practice using examples on Python and TensorFlow.|
|Neural Networks for Machine Learning||Neural networks, calculus, Python, Octave||4||50||This course brings the very important basics, though it might be not so modern. It uses Octave which is a bit wired for me personally. There is a review here on this course. Anyway, I would strongly recommend it because it brings clarity to the vision of NN.|
|Introduction to Deep Learning||Recurrent Neural Network, Tensorflow, Convolutional Neural Network, Deep Learning||2||100||This course introduces the core concepts of deep learning. This course is not for absolute beginners. You have to have a good understanding of programming in Python. It covers Tensorflow, Recurrent Neural Network, Convolutional Neural Network and others.|
Data Engineer Concepts
|DAT228x: Developing Big Data Solutions with Azure Machine Learning||Azure Machine Learning, Predictive Models, Azure Data Factory||1||100||During the course, you’ll build a pipeline to get data, train a model and update the model with Azure services. I liked this course since it gives detailed information about what you need to do to build the pipeline and which services you need and why.|
|Microsoft Azure HDInsight Big Data Analyst (X Series)||Hadoop, HBase, Storm, Spark, Azure HDInsight, Hive and Pig||3||300||The series consists of three courses and covers a very actual set of technologies. I took partially the first course of the series and found it has detailed explanation of tools, quite enough to understand what is happening under the hood.|
|Big Data for Data Engineers Specialization||HDFS, MapReduce and Spark RDD, Hive, Spark SQL, DataFrames and GraphFrames, Python||6||300||The specialization consists of 5 courses. They were made in collaboration of Yandex, Odnoklassniki and MIPT university. All teachers have good expertise in the industry of big data. The lectures, by some reviews, has very structured materials and clean way of storytelling.|
Length & Price
Length and prices are approximate. Some of the mentioned courses are available for free and only certification requires investments. Free access can vary from platform to platform and from course to course. For example, Coursera may hide assessments behind the paywall, when materials are available. On edX the only certificate is paid.
If you want in meanwhile to take a learning challenge in the field of Data Science don’t forget to check our Cases and Solution on the Learn section of our Data.Platform