Regardless of the industry, it is no exaggeration to say that management strategies that utilize data are now indispensable in today’s business. As a result, the demand in the global human resources market is increasing for data scientists who can extract new value from data and bring innovation to business.
However, the reality is that there are not so many companies that correctly understand the meaning of data science and the role of data scientists. This article provides an easy-to-understand explanation of the actual work content of a data scientist with recent examples.
table of contents
What is data science?
Two Reasons to Expect Data Science
What is a data scientist?
5 Use Cases of Data Science
at the end
What is data science?
First, let’s review the definition, history, and types of data science.
Definition of data science
Data science is an academic field consisting of various research fields such as statistics, IT, mathematics, and business administration. Based on knowledge from multiple research fields, we clarify the knowledge and value hidden in various data aggregated from corporate business systems, the Internet, questionnaires, etc., and use them to solve social and management issues.
History of data science
The theory that data science first appeared was the paper “Concise Survey of Computer Methods” published in 1974 by Danish computer scientist Peter Naur. Since then, with the evolution of AI and machine learning, and the advent of big data, the current data science has been in the limelight since around 2010 as a method useful for solving social and management issues.
Types of data science
Data science can be classified into three categories based on the types of data to be analyzed and statistical methods.
Data Aggregation and Graphing
No matter how much you look at the numbers contained in the data, you will not be able to see how it can be used to improve management. By aggregating data and visualizing it with pie charts, bar charts, etc., you can deepen your understanding of the meaning of numbers. This will facilitate smooth information sharing within the company, leading to faster decision-making for business improvement and new business development.
For these uses, IT vendors are providing Office applications such as Excel and Access, as well as tools that can instantly graph large amounts of data.
statistical inference or prediction
Data science can also be used for demand forecasting and economic forecasting using statistical data. Even with conventional statistical analysis methods, it is possible to make simple future predictions with a certain amount of data. However, with the evolution of IT, the amount and variety of data handled by companies has increased explosively, and conventional statistical methods cannot be expected to make highly accurate predictions.
Recently, machine learning has been widely used in the field of statistical analysis to make highly accurate predictions from large amounts of complex data.
artificial intelligence (AI)
Text is not the only data to be analyzed in data science. Big data also includes unstructured data such as images, sounds, and videos, which also contain useful information. Deep learning, which sparked the AI boom in recent years, enables image analysis, voice analysis, language processing, etc., and the results of such analysis are useful for solving management and social issues.
Two Reasons to Expect Data Science
Below are two reasons why data science is gaining traction in the business field.
Easier to collect and analyze big data
The reason why data science has attracted great expectations is that the evolution of IoT, cloud, AI, etc. has made it possible to collect and analyze big data easily and at low cost. The expansion of Internet services for general consumers and the spread of smartphones, tablets, etc. are also one of the reasons why it has become easier to obtain the big data generated there.
Expectations for profit expansion
One of the reasons for the increasing demand for information analyzed by data science is that it has the potential to bring huge profits to many companies regardless of industry. The 2020 edition of the White Paper on Information and Communications reports that many companies are working on data utilization in areas such as “management and organizational reforms,” ”planning and development of products and services,” and “marketing.” .
Questionnaire result diagram of business areas where data is utilized in Japanese companies
Source: Ministry of Internal Affairs and Communications “2020 White Paper on Information and Communications”
As you can see from the graph below, there are many companies that want to increase their profits by improving the amount of data they collect and their analysis techniques.
Questionnaire results on future data utilization efforts by Japanese companies
Source: Ministry of Internal Affairs and Communications “2020 White Paper on Information and Communications”
What is a data scientist?
The growing demand for data science has created a profession in the business world known as a data scientist.
Professionals don’t have a universal set of qualifications to become a data scientist. However, due to the lack of human resources who meet the skills required by companies, training data scientists is now a national issue.
From here, let’s take a look at the actual job content of a data scientist, the skills required, and the future.
data scientist job description
A data scientist’s job starts with extracting the necessary information from big data and unifying the format using tools to efficiently analyze various types of data.
After that, we make proposals for sales forecasts, planning and development of new products and services, and improvement of business processes, using the knowledge obtained through multifaceted data analysis.
3 skills a data scientist needs
A data scientist has three essential skills:
IT knowledge
The following IT skills are required for collecting, storing, processing, and processing data.
Programming skills such as Python and R language
database operation skills
Skills related to distributed processing of huge amounts of big data such as Hadoop
In addition, in order to understand the background of management issues, it would be good to have knowledge of IoT, websites, security, etc. that collect big data.
statistics
In order to properly analyze the collected data, knowledge such as the following statistics is also required.
Knowledge of mathematics such as probability statistics, Bayesian statistics, linear algebra, Laplace transforms
Data analysis skills such as statistics, pattern recognition, and AI
Working with data analysis tools such as SPSS and SAS
business skills
Since proposing ways to use the analyzed data is also an important job of data scientists, the following business skills are also required.
General knowledge of operations to be improved
Logical thinking to solve problems
Presentation skills to convey analysis results to management and business departments in an easy-to-understand manner
Could data scientist jobs disappear in the future?
Despite the serious shortage of human resources, it is said that the demand for data scientists may disappear in the near future. The reason is that advances in AI may automate many of the tasks that data scientists currently do.
However, AI’s strength lies in processing large volumes of data according to established rules. No matter how much AI evolves in the future, there will still be many jobs that require creativity, such as the development of new theories and models, that can only be handled by humans. That said, there is no doubt that constantly improving one’s skills in preparation for the emergence of new technologies is a fundamental requirement for a data scientist who can be active for a long time.
5 Use Cases of Data Science
Finally, I will introduce five use cases of data science.
Ministry of Health, Labor and Welfare: “National survey of new corona countermeasures” using LINE
Data science is already being used to prevent the spread of the novel coronavirus around the world. One of them is the “National Survey for New Corona Countermeasures” conducted by the Ministry of Health, Labor and Welfare and LINE.
In this survey, which began at the end of March 2020, more than 83 million LINE users nationwide have been surveyed five times about their occupation, current work style, daily life, and recent health condition. are being implemented.
The main purpose of this survey is to identify occupations with a high risk of infection and areas with a large number of infected people from the results of the questionnaire, and to use it for infection prevention measures.
Sushiro: Use data from over 1 billion records for demand forecasting and sales analysis
At Sushiro, IC tags are attached to sushi plates in order to collect order information for each table, such as what kind of sushi items were eaten at which restaurant. More than 1 billion items of data collected annually from IC tags are used to predict demand, such as store congestion and time spent at each seat, and control the amount of sushi toppings and plates that are sent to the lane.
In addition, the 4 billion items of sales data accumulated in-house are used for various purposes such as cost reduction and new product development.
Osaka Gas: Predicting the Cause of Gas Equipment Failures and Improving Customer Satisfaction
Osaka Gas has built a system that automatically extracts five parts that are likely to be the cause of failure from the operation data of customers’ gas equipment and the repair history accumulated at the call center.
Since workers can prepare parts in advance and visit customers, the number of cases in which repairs are completed in one visit has increased. As a result, the work efficiency of workers and customer satisfaction are improved at the same time.
Intel: Reduced chip quality inspection saves $3 million
Since 2012, Intel in the United States has been using data science to shorten the chip manufacturing process.
By analyzing past data collected from chips before shipment and conducting quality inspections 19,000 times on a limited number of chips, we succeeded in significantly shortening the period from manufacturing to shipment. As a result, we have successfully reduced manufacturing costs by as much as $3 million.
Benesse: Used to design and improve learning materials and set goals for children
Benesse has established a specialized analysis center to analyze children’s learning records from existing questionnaires, observation records, and digital teaching materials.
The learning records of children from elementary school to high school collected by the Analysis Center are used to design and improve teaching materials. In addition to this, it is useful for setting goals for children and predicting future achievements.
at the end
Data is now a valuable information asset with the potential to bring enormous benefits to the enterprise. Even data that at first glance seems to have nothing to do with business, such as tweets on social media or old files stored on file servers, can lead to new businesses through analysis.
However, data science requires a high degree of expertise and is not something that anyone can easily do. For those who are interested in data science, it may be a good idea to take the first step by using familiar data accumulated in the company and reviewing its value.
birth background
The term “data science” has gained general attention since the 2010s, but its origins date back more than 50 years. There are a certain number of people who use statistical analysis in their work, and they have been called researchers.
Especially since the 2000s, data utilization methods have developed significantly, and today there are concerns about a shortage of human resources in many fields. Looking back on the background of its birth, the following can be cited even after the 2000s alone.
Windows and SaaS (Software-as-a-Service) have spread and personal computer ownership is common. The Internet has permeated our lives, and the amount of data that comes and goes has increased.
2002 “Data Science Journal” is launched to publish papers on database management.
In 2006, artificial intelligence advanced dramatically through deep learning using autoencoders.
In 2008, people calling themselves “data scientists” appeared at Google and others, and their skills and job descriptions were discussed.
In 2010, the term “big data” was proposed due to the significant increase in the amount of data transferred over the Internet.
In 2012, a team that adopted deep learning achieved remarkable results in a competition for image recognition accuracy and won the championship. Google also announced that it succeeded in recognizing cats from YouTube images, leading to the third artificial intelligence boom that continues to this day.
In addition, Facebook started service in 2004, YouTube in 2005, Twitter in 2006, and the iPhone was released in 2007, and the amount of data increased at the same time as the birth of things indispensable to modern life.
A market that wants proposals that are directly linked to business issues, where the fact that data utilization technology is producing excellent results amidst a rapid increase in the amount of data being handled is often out of focus due to diversification of needs, etc. This led to the needs of It can be said that the concept and occupation born from this is the data scientist.
Demand for data scientists
AI (Artificial Intelligence) is beginning to surpass humans in certain areas, and the arrival of the AI era is expected beyond that. Not only deep learning, which is the core technology, but also the data scientists who handle it are getting more attention.
Since 2012, some universities in Japan have established data science faculties, and the number of graduate schools offering master’s and doctoral degrees has increased, indicating the importance of learning data science and the value of human resources. . Against this background, it can be said that data scientists are expected to be in great demand in the future.
On the other hand, there is also a tendency to discuss the possibility of being replaced by AI in the future. Certainly, it is thought that AI will replace areas that require identification and prediction with both high speed and accuracy related to computational processing, which can be obtained by supercomputers and quantum computers. However, fundamentally, how to use data to change society and where to find value will be discussed, designed, and implemented by humans. However, the idea is that there is AI as a partner as a computational resource. That’s where the value of a data scientist lies, and why you shouldn’t just be an “analyst”. The demand for human resources who can work together with the workplace and produce results by proposing improvements based on data will not disappear and will continue to increase in the future.
Differences with Data Analysts
data analyst
As for the difference between a data scientist and a data analyst, there are some common jobs, but a data analyst specializes in collecting and analyzing data, while a data scientist is based on statistics and computer science. It is a job that aims to solve the problems that companies face. The big difference between the two is that they cover a wider area than data analysts.
As a prerequisite for data utilization, data scientists identify and prioritize issues, set issues, clarify goals to be achieved, and formulate hypotheses. In the past, this was the responsibility of business planning departments and consultants, but now that management strategies using big data have become commonplace, it has become a data scientist’s area of responsibility.
In this way, data analysts mainly specialize in “collection” and “analysis” of data, while data scientists are mainly specialized in “extracting problems”, “collecting and analyzing”, “building hypotheses”, “algorithms and prediction models”. There is a difference that it is responsible for a wide range of defense such as “implementation of
data scientist job description
The actual work of a data scientist is diverse, but when summarized, it is generally divided into four phases, and the flow is as follows.
data scientist workflow
(1) Analysis planning → Analysis project launch → Work design after incorporation (
2) Approach design and data collection → Structured data processing/Unstructured data processing (
3) Data analysis → Data visualization → Evaluation (
4) Incorporation into work → Work evaluation and improvement
A data scientist’s main job is to analyze a large amount of accumulated data and “utilize” the analysis results. There is a tendency to focus on big data analysis, but the main focus is to contribute to business according to on-site judgment.
Skills required for a data scientist
Skills required for a data scientist
1. Data Science Skills
—Basic mathematics, understanding and verification of data, machine learning techniques, etc.
2. Data engineering skills
—Environment construction, programming, IT security, data processing, etc.
3. Business skills
—Logical thinking, task definition, activity management, etc.
As mentioned in the history of data scientists and comparisons with other occupations, skills in the field of machine learning and the ability to manage activities in the actual business scene in a way that develops the statistical analysis profession. is required of data scientists.
in conclusion
So far, we have introduced the background to the birth of data scientists, future demand and job descriptions, and the necessary skills. While it is a field that is expected to continue to play an active role in the future due to the wide range of work it handles and its expertise, it can also be said that the more the standard rises with the development of technology, the more the strengths and colors of the human resources become necessary. .
A data scientist can be said to be a rewarding profession that can further contribute to the development of society by honing their expertise and being active in the field of business.
What is a data scientist? Explanation of job content and necessary skills
With the rapid spread of smartphones that have permeated our lives and the development of information processing technology, the amount of data that individuals and companies can store and handle has increased, making large-scale data utilization possible. Real-time processing of huge amounts of data has become possible, and the importance of data utilization continues to grow.
Under these circumstances, there is a demand for “data scientists,” human resources who can propose new value from data utilization. In this article, we will introduce what kind of human resource a data scientist is, based on the background and future prospects.