A History of Data Science
Data science has quickly become one of the most influential fields in technology, and it has led to many powerful and groundbreaking inventions. In the previous article, I discussed several ways in which data science is used, and we will be delving deeper into the applications in a later post. However, not many people know the history of data science. Data science is entwined with statistics, computing, and artificial intelligence. From breakthroughs in statistics to the rise of machine learning, data science has been constantly changing. Today, we’re going to explore the development and history of data science.
The foundation of data science comes from math, specifically statistics. Let’s start in the 1600s, with someone that everyone has heard of, Isaac Newton. What people don’t know is that when Newton discovered calculus, it kickstarted the development of data science. Calculus is often used in optimizing machine learning models, specifically training models. Then, we have Pierre-Simon Laplace, who made contributions to probability theory. Probability theory is used in machine learning for probabilistic modeling and prediction. Carl Friedrich Gauss contributed to least squares regression. This is a technique used to minimize errors in statistical models. He is also known for the Gaussian distribution, which is a distribution that is symmetric about the mean. This is very important to statistical analysis and machine learning.
Now, let’s move on to the mid-1800s. Herman Grassman contributed to the development of vector spaces. Vector spaces are vital for representing and processing data in multiple dimensions. Machine learning models also use vectors to represent data points. Similar to Grassman, Arthur Cayley helped develop matrix theory and the theory of vector spaces. Both of these are crucial to data representation and manipulation. Towards the late 1800s, Ronald Fisher developed methods in statistics like maximum likelihood estimation, hypothesis testing, analysis of variance, and p-value. These methods are used in data analysis to evaluate hypotheses and determine statistical relationships. Machine learning algorithms also rely on these concepts to improve their predictions. Another very important mathematician is Leonid Kantorovich. He developed optimization theory, which helps to develop efficient algorithms that can learn from data.
So far, I have been talking about the mathematicians who created the foundations of data science. Now, let’s move to the time when data science became a distinct subject. In 1957, Arthur Samuel created the term “machine learning.” Just a few years later, in 1962, John Tukey wrote a paper called The Future of Data Analysis. In his paper, he talked about changes in statistics, the connections between statistics and computers, and computers being used to solve mathematical problems. In 1974, Peter Naur wrote the Concise Survey of Computer Methods and used the term “data science” throughout the publication. His definition of data science was: “The usefulness of data and data processes derives from their application in building and handling models of reality.” The International Association for Statistical Computing was formed in 1977, and Tukey published a second paper in the same year, titled Exploratory Data Analysis. This paper was about the importance of data in selecting and testing hypotheses. The first data science workshop by the The Knowledge Discovery in Databases organization was arranged in 1989.
Let’s move on to more recent events. The Data Science Journal was first published in 2002 by The International Council for Science: Committee on Data for Science and Technology (CODATA). It covered areas like data science applications, risk issues, and data systems. Hadoop, which I talked about in the previous post, was released in 2006. In 2008, “Data Scientist” became an official buzzword thanks to DJ Patil and Jeff Hammerbacher (from LinkedIn and Facebook). Job listings for data scientists increased by 15,000% in 2011, and in 2012, Harvard declared the job “Data Scientist” as the sexiest job of the 21st century.
Today we explored the history of data science, starting from statistics and stretching all the way to Big Data. Although the history of data science is incredibly long, the future is going to be even longer. As data science continues to grow and impact every aspect of people’s lives, more and more history will continue to be made.