“Data Scientist: The Sexiest Job of the 21st Century” – Harvard Business Review
Data Scientist became a buzzword after Harvard Business Review called it as the “Sexiest Profession” of 21st century. Data Scientist is someone who is good at Data Science and uses it to perform extraordinary feat. But what exactly is Data Science? In this post we are going to understand more about Data Science.
Definition of Data Science
Data Science is an early and immature field of study compared to Physics, Mathematics or Art. Because of this we do not have a clear definition of Data Science. Wikipedia defines it as follows:
Data Science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.
Wikipedia presents one more definition:
Data science is a “concept to unify statistics, data analysis and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.
If you really dig through the web, you can find a lot more definitions for Data Science. In abstract we can define it like Data Science is the study of data. Let’s see more definitions.
- Data Science is the process of using data to understand different things, and to understand the world.
- Data Science is the art of uncovering the insights and trends that are hiding behind the data.
- Data Science is the process using automated methods to analyze massive amounts of data and to extract knowledge from them.
- Data Science is a field that encompasses anything related to data cleansing, preparation, and analysis. Put simply, Data Science is an umbrella term for techniques used when trying to extract insights and information from data.
- Data Science combines different fields of work in statistics and computation in order to interpret data for the purpose of decision making.
- Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies.
Who is a Data Scientist?
We have seen different definitions of Data Science. Now let’s unfold what skills you need become a Data Scientist.
Data Scientist: Person who is better at statistics than any software engineer and better at software engineer than any statistician.
We here it often. But does this define the role of a Data Scientist? With Statistics and Software Engineering, can you become a Data Scientist? Yes and no. Let me explain in depth. Data scientists use their knowledge of statistics and modeling to convert data into actionable insights.
This Venn Diagram explains who is a Data Scientist very well. A Data Scientist is someone who has enough Mathematical & Statistical Knowledge and very good at Programming. These 2 skills are enough to become good at machine learning. But skills of a data scientist goes beyond that. A Data Scientist possess substantive expertise to ask good questions and have the understanding of his/her field whether it’s medical, business or education.
Skills you need to become a Data Scientist
- Good Programming Skills – Whether it’s Python, R, Julia or Octave. You have to be really good at minimum of one programming language.
- Mathematics – Yes, you need to be good at math. It covers from Linear Algebra to Calculus
- Statistics – Probability theory and Statistics
- Machine Learning
- Databases – As a Data Scientist you have to store data. Usually it’s done in databases. eg:- MySQL, PostgreSQL, MongoDB, Cassandra, etc
- Big Data
- Communication Skills
- Domain Expertise
- and Curiosity
That concludes the question What is Data Science? and Who is a Data Scientist? We will discover more about Data Science in the new posts. Stay tuned.