Soon Factovare can take over many of the manual works in manufacturing industry.

So Learn to, work along with Factovare and focus on what Factovare cannot do.

Register for Free Training & Certification

Quick security check, then try again.

Are you building manufacturing software?

Introduction to Data Science

Figure 1 : Structure of the Article

What is Data Science?

Data Science as the name suggests is a field of science that deals with data. It combines the power of computers and mathematics for analyzing data, extracting important information from it and process this information for getting a useful output.

Figure 2 : What is Data science

Learn about the providers of online masters in data science by clicking here

How can we use Data Science?

There are two ways in which we can use data science:

  1. Finding a solution to a problem by analyzing the data.
  2. Analyzing the data and come up with new ideas that can be implemented or come up with new problems that can be solved with it.

Classifications of Data Science

Free Training & Certification on Factovare Capacity Module Factovare Channel Partner Opportunity
Factovare
One Month Free Trial

See how Factovare helps factories digitize work

Watch the demo and contact us to try Factovare for your manufacturing operations.

Data Science can be classified into the following:

  1. Data Collection
  2. Data Analysis
  3. Data Visualization

We will take a brief look at each of these three…

Data Collection

In philosophy, we call the things that are known or are assumed as facts which makes the basics of reasoning and calculation as DATA. Collecting data has been one of the most common things that humans have been doing for ages.

Our ancestors used to collect data in rocks and stones for remembering the number of their cattle or to create memories about their life or the knowledge they have gained which they wanted to pass on to the next generation.

In the modern world, the basic purpose of collecting data is for using it to find solutions to existing problems.

We collect data mainly in these different forms like:

  • Sound data 
  • Visual data
  • Text data

Types of data

The two main types of data are:

Structured data

Structured data is information that is organized. For example, a data set which contains names and roll numbers in two different column.

Unstructured data

These are a collection of information that is not processed. Examples are IoT sensor data, emails, chats, etc.


Data Analysis

Now that we have collected the data, for finding the solution to the problem that we have, we need to analyse the data.

The process of analyzing data using different tools like R, Python, MATLAB, etc. (We can use the libraries available in these programming languages for analyzing data by plotting graphs or charts) is called data analysis.                                     

For example, consider the problem of housing price prediction. Imagine we have a dataset containing the prices of houses over the past 10 years. We would like to predict the price of the house in the coming year using this data.

One way we could do this is by plotting a graph where on the x-axis we give the years and, on the y-axis, we give the price of houses. When we plot the data like that, we would be able to see a pattern in which the prices of the house are increasing or decreasing over time.

And now by using this trend we would be able to predict the possible increase in price for a house in the coming years.

Data Visualization

Data visualization is a tool that is used to explain the data using graphical representations of the data. It helps the data analyst to understand different patterns in data and outliers and trends in data.

Also, the data analyst can use the visualization techniques to present his findings to the customer in the form of graphs, charts and maps.

Some of the different libraries in python for data visualization are:

  1. Plotly
  2. Seaborn
  3. Ggplot
  4. Altair
  5. Matplotlib
  6. Bokeh
  7. Folium

If we are not using a programming language for visualization, we can use below tools:

  1. Google charts
  2. Tableau
  3. Xplenty
  4. Hubspot
  5. Whatagraph

Data visualization example

We shall see an example of data visualization of data about three machines A, B, C, D and E for the period 01-10-2020 to 07-10-2020, done in python programming language using Plotly library.

Figure 3 : Shift wise down time data
Figure 4 : Machine performance
Figure 5: Daily performance

Subsets of Data Science

Figure 6 : Subsets of Data Science

Artificial Intelligence

AI – Artificial Intelligence is the intelligence that enables machines to think like a human and find solutions to problems with little or no human intervention. There are mainly 3 types of AI:

Artificial Narrow Intelligence (ANI)

Narrow AI is the most common form of AI that machines have these days. ANI allows machines to be automated and do a particular task or a small set of tasks all on its own, with very little or no human intervention.

It doesn’t have emotions or feelings of consciousness. It cannot do a wide variety of tasks if it isn’t programmed for it.

Examples:

  • Self-driving cars
  • Auto – Pilot
  • Spam Filters
  • Chatbots

Artificial General Intelligence (AGI)

This type of AI can only be seen in sci-fi movies and can exhibit human-level intelligence. This type of AI would be hard to distinguish from normal humans and would be able to show emotional intelligence.

They can think like human beings and would be able to solve problems based on situations rather than just system needs. In other words, if there is a situation where a particular solution to a problem might be harmful to someone else, at this situation the machine might choose another solution.

Artificial Super Intelligence (ASI)

This type of AI will have an intelligence level that would be far superior to humans and would be able to think much faster than us. They would have greater problem-solving skills and updates themselves which would be more brilliant than the one before, all in just a matter of days.

They would have the ability to evolve quickly and become the better versions of themselves. This type of intelligence can even be a threat to our existence.

Machine Learning

Machine learning is the process of teaching a machine to accept inputs and do calculations based on algorithms build upon statistics and probability, to come up with an output, that is closer or equal to the expected output.

We can see the use of machine learning in our day to day life, for example, the recommendation system in YouTube or Instagram ads is all based on machine learning where the data of what the user clicks the most and likes the most is fed into a system and the system learns about the user’s interests and it suggests the contents that the user is most interested in.

Machine learning is classified mainly into 3 types of learning:

1.Supervised Learning

Let’s say we want our machine to classify images of apple from a set of other images. In supervised learning, we will initially provide the ML-model input images and labels according to the name of the fruit in the image.

An ML-Model is a set of algorithms that learn different features from input data and gives an output.

The model would compare the image and label and learn the features that map a particular image to a particular label.

And now when we give the model a new image it would be able to identify the same features that it had seen in the data, that we had used to train it and would map the image to the particular label.

Common supervised learning problems:

  • Classification: Classification group’s the output to categories that are previously given to it as labels. For example 0,1, cat, mouse, apple, mango, etc.
  • Regression: Regression is used to predict a continuous quantity. For example predicting live temperature in room. Stock market price prediction is also an example.

2.Unsupervised Learning

In unsupervised learning, the model is provided with input data without any labels. The model would categorize the data into different groups based on similar features. Unsupervised learning is mainly used for two types of problems:

  • Clustering: Clustering identifies features that are similar between data and classifies according to these similarities. The model itself classifies input data according to similar features in data. For example, clustering peoples to different groups based on the spread of COVID-19 in their area.
  • Association: For example, associating a particular product to a buyer based on another product he brought recently (mapping).

3.Reinforcement Learning

Its like teaching a baby what’s right and wrong. If he does right we will appreciate him, by giving him some chocolates, gifts, etc.. and we will give him a feedback if he does something wrong. So next time if he does something he would know that it is good or bad based on feedbacks or rewards he got before while doing the same.

So, reinforcement learning is a reward-based system in which an agent interacts with an environment by performing some actions and learn from rewards (either negative or positive) obtained from interpreter . There is no predefined data and no supervision. Follows a trial-and-error method for learning. It should identify an output by itself and we would just say if its right or wrong.

Examples:

Self driving cars where the environment is road and the interpreter (error signal generator) is a human in the driving seat. The human sends a signal based on the direction the car automatically takes or maybe the lane changes the car makes or maybe while parking if it follows the rules.

An automated machine that is used for categorizing products into different groups based on its weight. The person who monitors the task would generate an error signal which is negative if the machine classify the product wrongly and it would give positive response if the machine does it correctly

In addition to these, there is another type of learning called semi-supervised in which some data is labelled and others are unlabelled.

4.Deep Learning

Deep learning is a subset of machine learning where we use artificial neural networks for doing the supervised, unsupervised, and reinforcement learning tasks.

Artificial Neural Networks (ANNs) are inspired by the neurons in the human brain. In deep learning, we use multiple layers of neurons connected in which one layer of the neuron will learn a particular feature from the input and the output is passed through a function, which mostly uses some probabilistic equations to identify the useful features and pass it as an input to the next layer and so on, until it reaches the final layer where we get an output.

Benefits of using neural networks

Neural network can have lots of layer’s, each having number of neurons. So even if one neuron isn’t performing well the model would identify it and won’t affect the performance. Also the data (input data and the features identified from data) is stored in the neurons itself in form of numbers. So we don’t have to use a separate database for storing these data.

Also neural networks, can be manipulated to whichever way we want for different tasks. it can be used for solving multiple problems, basically like our brain can do lots of things by firing different sets of neurons.

The two main areas in which Deep Learning is used the most are:

Computer vision

Computer vision is a field of artificial intelligence that uses deep learning to learn about the visual world. We know that an image is a collection of pixel values. In the computer, we represent these values as numbers in the matrix.

These numbers are fed into the neural networks which would then learn the features of the image and would be able to either classify an image or to detect an object in the image.

A type of neural network called Convolutional Neural Networks (CNNs) is used commonly for this. Some of the most common applications of computer vision are:

  • Defect detection in manufacturing
  • Self-driving cars
  • Intruder detection

Natural Language Processing

NLP is a field of artificial intelligence that uses the power of neural networks to understand human language in a useful way. NLP can be used to read, understand, and create natural language. Some of the applications of NLP are:

  • Google Translate
  • MS Word, Grammarly – for grammar check or spellcheck
  • Siri, Alexa – Personal Voice Assistant

If you have decided to go ahead with data science, you can refer out next article on data science.

Thank you

If you know any subject that can be related to manufacturing industry or industrial engineering, you can earn some income by becoming article contributor of this website. For knowing more about it, please visit Join us page.

You don’t need to have any experience in article writing, just knowledge on the subject is needed.

Also you can know more about our team of article contributors by visiting the about us page.

About the Author

Mr. Deepak Jose

Deepak Jose is a B-Tech CS student with a passion for Data Science. Loves learning about Data Science, coding, and science in general. Does data analysis and visualization as a hobby. Even though I’m in the Computer Science path I always find time to learn about space, automobiles, geography, energy, architecture, arts, etc. Loves solving problems and learning about new inventions.


LATEST ARTICLES FROM KNOW INDUSTRIAL ENGINEERING

Now or Never

We’ve got your back on your manufacturing journey — Stay in touch

Follow us for step-by-step guidance, templates, and insights that save time and reduce mistakes.

Know Industrial Engineering Platform – Helping manufacturing industry professionals worldwide since 2019

1 thought on “Introduction to Data Science”

Leave a Comment

Your email address will not be published. Required fields are marked *

Learn more from our YouTube Channel Click Subscribe to close this banner
Subscribe