Copy Link Button
Share
Top Article Link
Top Pick

Introduction to Data Science

Figure 1 : Structure of the Article

What is Data Science?

Data Science as the name suggests is a field of science that deals with data. It combines the power of computers and mathematics for analyzing data, extracting important information from it and process this information for getting a useful output.

Figure 2 : What is Data science

Learn about the providers of online masters in data science by clicking here

How can we use Data Science?

There are two ways in which we can use data science:

  1. Finding a solution to a problem by analyzing the data.
  2. Analyzing the data and come up with new ideas that can be implemented or come up with new problems that can be solved with it.

Classifications of Data Science

Data Science can be classified into the following:

  1. Data Collection
  2. Data Analysis
  3. Data Visualization

We will take a brief look at each of these three…

Data Collection

In philosophy, we call the things that are known or are assumed as facts which makes the basics of reasoning and calculation as DATA. Collecting data has been one of the most common things that humans have been doing for ages.

Our ancestors used to collect data in rocks and stones for remembering the number of their cattle or to create memories about their life or the knowledge they have gained which they wanted to pass on to the next generation.

In the modern world, the basic purpose of collecting data is for using it to find solutions to existing problems.

We collect data mainly in these different forms like:

  • Sound data 
  • Visual data
  • Text data

Types of data

The two main types of data are:

Structured data

Structured data is information that is organized. For example, a data set which contains names and roll numbers in two different column.

Unstructured data

These are a collection of information that is not processed. Examples are IoT sensor data, emails, chats, etc.


Data Analysis

Now that we have collected the data, for finding the solution to the problem that we have, we need to analyse the data.

The process of analyzing data using different tools like R, Python, MATLAB, etc. (We can use the libraries available in these programming languages for analyzing data by plotting graphs or charts) is called data analysis.                                     

For example, consider the problem of housing price prediction. Imagine we have a dataset containing the prices of houses over the past 10 years. We would like to predict the price of the house in the coming year using this data.

One way we could do this is by plotting a graph where on the x-axis we give the years and, on the y-axis, we give the price of houses. When we plot the data like that, we would be able to see a pattern in which the prices of the house are increasing or decreasing over time.

And now by using this trend we would be able to predict the possible increase in price for a house in the coming years.

Data Visualization

Data visualization is a tool that is used to explain the data using graphical representations of the data. It helps the data analyst to understand different patterns in data and outliers and trends in data.

Also, the data analyst can use the visualization techniques to present his findings to the customer in the form of graphs, charts and maps.

Some of the different libraries in python for data visualization are:

  1. Plotly
  2. Seaborn
  3. Ggplot
  4. Altair
  5. Matplotlib
  6. Bokeh
  7. Folium

If we are not using a programming language for visualization, we can use below tools:

  1. Google charts
  2. Tableau
  3. Xplenty
  4. Hubspot
  5. Whatagraph

Data visualization example

We shall see an example of data visualization of data about three machines A, B, C, D and E for the period 01-10-2020 to 07-10-2020, done in python programming language using Plotly library.

Figure 3 : Shift wise down time data
Figure 4 : Machine performance
Figure 5: Daily performance

Subsets of Data Science

Figure 6 : Subsets of Data Science

Artificial Intelligence

AI – Artificial Intelligence is the intelligence that enables machines to think like a human and find solutions to problems with little or no human intervention. There are mainly 3 types of AI:

Artificial Narrow Intelligence (ANI)

Narrow AI is the most common form of AI that machines have these days. ANI allows machines to be automated and do a particular task or a small set of tasks all on its own, with very little or no human intervention.

It doesn’t have emotions or feelings of consciousness. It cannot do a wide variety of tasks if it isn’t programmed for it.

Examples:

  • Self-driving cars
  • Auto – Pilot
  • Spam Filters
  • Chatbots

Artificial General Intelligence (AGI)

This type of AI can only be seen in sci-fi movies and can exhibit human-level intelligence. This type of AI would be hard to distinguish from normal humans and would be able to show emotional intelligence.

They can think like human beings and would be able to solve problems based on situations rather than just system needs. In other words, if there is a situation where a particular solution to a problem might be harmful to someone else, at this situation the machine might choose another solution.

Artificial Super Intelligence (ASI)

This type of AI will have an intelligence level that would be far superior to humans and would be able to think much faster than us. They would have greater problem-solving skills and updates themselves which would be more brilliant than the one before, all in just a matter of days.

They would have the ability to evolve quickly and become the better versions of themselves. This type of intelligence can even be a threat to our existence.

Machine Learning

Machine learning is the process of teaching a machine to accept inputs and do calculations based on algorithms build upon statistics and probability, to come up with an output, that is closer or equal to the expected output.

We can see the use of machine learning in our day to day life, for example, the recommendation system in YouTube or Instagram ads is all based on machine learning where the data of what the user clicks the most and likes the most is fed into a system and the system learns about the user’s interests and it suggests the contents that the user is most interested in.

Machine learning is classified mainly into 3 types of learning:

1.Supervised Learning

Let’s say we want our machine to classify images of apple from a set of other images. In supervised learning, we will initially provide the ML-model input images and labels according to the name of the fruit in the image.

An ML-Model is a set of algorithms that learn different features from input data and gives an output.

The model would compare the image and label and learn the features that map a particular image to a particular label.

And now when we give the model a new image it would be able to identify the same features that it had seen in the data, that we had used to train it and would map the image to the particular label.

Common supervised learning problems:

  • Classification: Classification group’s the output to categories that are previously given to it as labels. For example 0,1, cat, mouse, apple, mango, etc.
  • Regression: Regression is used to predict a continuous quantity. For example predicting live temperature in room. Stock market price prediction is also an example.

2.Unsupervised Learning

In unsupervised learning, the model is provided with input data without any labels. The model would categorize the data into different groups based on similar features. Unsupervised learning is mainly used for two types of problems:

  • Clustering: Clustering identifies features that are similar between data and classifies according to these similarities. The model itself classifies input data according to similar features in data. For example, clustering peoples to different groups based on the spread of COVID-19 in their area.
  • Association: For example, associating a particular product to a buyer based on another product he brought recently (mapping).

3.Reinforcement Learning

Its like teaching a baby what’s right and wrong. If he does right we will appreciate him, by giving him some chocolates, gifts, etc.. and we will give him a feedback if he does something wrong. So next time if he does something he would know that it is good or bad based on feedbacks or rewards he got before while doing the same.

So, reinforcement learning is a reward-based system in which an agent interacts with an environment by performing some actions and learn from rewards (either negative or positive) obtained from interpreter . There is no predefined data and no supervision. Follows a trial-and-error method for learning. It should identify an output by itself and we would just say if its right or wrong.

Examples:

Self driving cars where the environment is road and the interpreter (error signal generator) is a human in the driving seat. The human sends a signal based on the direction the car automatically takes or maybe the lane changes the car makes or maybe while parking if it follows the rules.

An automated machine that is used for categorizing products into different groups based on its weight. The person who monitors the task would generate an error signal which is negative if the machine classify the product wrongly and it would give positive response if the machine does it correctly

In addition to these, there is another type of learning called semi-supervised in which some data is labelled and others are unlabelled.

4.Deep Learning

Deep learning is a subset of machine learning where we use artificial neural networks for doing the supervised, unsupervised, and reinforcement learning tasks.

Artificial Neural Networks (ANNs) are inspired by the neurons in the human brain. In deep learning, we use multiple layers of neurons connected in which one layer of the neuron will learn a particular feature from the input and the output is passed through a function, which mostly uses some probabilistic equations to identify the useful features and pass it as an input to the next layer and so on, until it reaches the final layer where we get an output.

Benefits of using neural networks

Neural network can have lots of layer’s, each having number of neurons. So even if one neuron isn’t performing well the model would identify it and won’t affect the performance. Also the data (input data and the features identified from data) is stored in the neurons itself in form of numbers. So we don’t have to use a separate database for storing these data.

Also neural networks, can be manipulated to whichever way we want for different tasks. it can be used for solving multiple problems, basically like our brain can do lots of things by firing different sets of neurons.

The two main areas in which Deep Learning is used the most are:

Computer vision

Computer vision is a field of artificial intelligence that uses deep learning to learn about the visual world. We know that an image is a collection of pixel values. In the computer, we represent these values as numbers in the matrix.

These numbers are fed into the neural networks which would then learn the features of the image and would be able to either classify an image or to detect an object in the image.

A type of neural network called Convolutional Neural Networks (CNNs) is used commonly for this. Some of the most common applications of computer vision are:

  • Defect detection in manufacturing
  • Self-driving cars
  • Intruder detection

Natural Language Processing

NLP is a field of artificial intelligence that uses the power of neural networks to understand human language in a useful way. NLP can be used to read, understand, and create natural language. Some of the applications of NLP are:

  • Google Translate
  • MS Word, Grammarly – for grammar check or spellcheck
  • Siri, Alexa – Personal Voice Assistant

If you have decided to go ahead with data science, you can refer out next article on data science.

Thank you

If you know any subject that can be related to manufacturing industry or industrial engineering, you can earn some income by becoming article contributor of this website. For knowing more about it, please visit Join us page.

You don’t need to have any experience in article writing, just knowledge on the subject is needed.

Also you can know more about our team of article contributors by visiting the about us page.

About the Author

Mr. Deepak Jose

Deepak Jose is a B-Tech CS student with a passion for Data Science. Loves learning about Data Science, coding, and science in general. Does data analysis and visualization as a hobby. Even though I’m in the Computer Science path I always find time to learn about space, automobiles, geography, energy, architecture, arts, etc. Loves solving problems and learning about new inventions.


LATEST ARTICLES FROM KNOW INDUSTRIAL ENGINEERING

  • How to prioritize machines and activities for implementing SMED
    This article is written by Bharathkumar Radha Krishna. He is an Industrial engineer with expertise in lean methodologies and value stream mapping .auth78954-author-photo-minimal img { width: 60px; height: 60px; object-fit: cover; border-radius: 50%; /* Ensure image doesn’t shrink too much */ min-width: 60px; min-height: 60px; } .auth78954-author-info-minimal { text-align: left; flex: 1; /* Allow text… Read more: How to prioritize machines and activities for implementing SMED
  • Single-Minute Exchange of Die (SMED)
    This article is written by Bharathkumar Radha Krishna. He is an Industrial engineer with expertise in lean methodologies and value stream mapping .auth78954-author-photo-minimal img { width: 60px; height: 60px; object-fit: cover; border-radius: 50%; /* Ensure image doesn’t shrink too much */ min-width: 60px; min-height: 60px; } .auth78954-author-info-minimal { text-align: left; flex: 1; /* Allow text… Read more: Single-Minute Exchange of Die (SMED)
  • How to add dimensions in AutoCAD
    We all know that AutoCAD is a powerful tool for drafting and designing and it is widely used in various fields such as engineering, construction, architecture etc. If you want to learn more about AutoCAD you can check the articles here. In today’s article, we are going to talk about a very important tool that… Read more: How to add dimensions in AutoCAD
  • What will happen if you don’t hire an Industrial Engineer
    If you don’t hire an industrial engineer, you might be missing out on a crucial opportunity to optimize your business processes, improve efficiency, and ultimately save both time and money. Industrial engineers are professionals who specialize in finding ways to make systems and processes work better. Here are some compelling reasons why hiring an industrial… Read more: What will happen if you don’t hire an Industrial Engineer
  • Why a candidate with Industrial Engineering background is most suitable to lead a factory
    I prefer candidates with an Industrial Engineering background to lead factory Operations or similar higher roles for several below compelling reasons. In addition to their traditional responsibilities, an individual with this background brings unique skills and perspectives to the role, fostering enhanced efficiency and innovation across the entire organization. Industrial Engineers are adept at optimizing… Read more: Why a candidate with Industrial Engineering background is most suitable to lead a factory
  • Unit Per People Hour (UPPH)
    Unit per people hour is a measure of manhour used for manufacturing a product. It is abbreviated as UPPH. In this article lets discuss some formulas related to this and the uses of UPPH. At the end of this article you will be able access an online tool related UPPH, where you can enter the… Read more: Unit Per People Hour (UPPH)
  • Unit Per Hour (UPH)
    Unit per hour is a measure of capacity of manufacturing or assembly line in a factory. Unit per hour is abbreviated a UPH. UPH means, how many units a manufacturing line or assembly can be produced in an hour. Let’s discuss this in detail. We will include following in this article. You may refer a… Read more: Unit Per Hour (UPH)
  • How to grade operators in a factory and Why
    Grading of the operators is categorizing or grouping the operators according to the various factors which is essential to do the job. In this article we will be discussing on how to grade operators along with the factors to be considered. So, stay tuned… We are explaining everything with practical examples. Before moving to our… Read more: How to grade operators in a factory and Why
  • 50 Problems which 5S Solve
    In this article we will be discussing about 50 problems in your factory, which can be solved by implementing 5S. Below, we have listed 50 problems and explained how these problems are solved using 5S. Before moving in to list, if you not yet attended our 5S training we recommend to do it. So, that… Read more: 50 Problems which 5S Solve
  • Andon in Manufacturing
    Visual controls are the major part in any stream of business. May be Production, Road transport, Railway stations and in home appliances. Andon is a visual management tool used in lean manufacturing to communicate the status of production to the operators or the production team. The status can be Break down, Material Shortage, Quality issue,… Read more: Andon in Manufacturing
  • A guide on Key Performance Indicator
    A Key Performance Indicator (KPI) is a measurable metric that evaluates success and aligns actions with strategic goals in business. In the dynamic landscape of modern business, staying ahead requires more than just intuition and experience. Data-driven decision-making has become a cornerstone for success, and Key Performance Indicators (KPIs) play a pivotal role in this… Read more: A guide on Key Performance Indicator
  • CNC machine problems and solutions
    In the intricate world of CNC machining, breakdowns can be a significant hurdle to seamless production. This article dives deep into the most prevalent issues faced by CNC machines, from improper voltage supply to auto tool changer glitches, providing valuable insights into their causes and offering expert solutions. Discover how to troubleshoot and address these… Read more: CNC machine problems and solutions
  • Manpower productivity formula
    Manpower productivity, also known as labor productivity, is a measure of the efficiency of labor in generating economic output. It is typically calculated by the formula, ratio of output (goods or services produced) to the input of labor. The goal of measuring manpower productivity is to assess how effectively human resources are utilized in the… Read more: Manpower productivity formula
  • What is Microsoft Excel
    Microsoft Excel, a stalwart in the world of spreadsheet software, has become synonymous with data management, analysis, and decision-making. Since its inception in 1985, Excel has undergone significant transformations, evolving into a multifaceted tool that caters to the needs of individuals, businesses, and researchers alike. In this comprehensive guide, we’ll unravel the essence of Microsoft… Read more: What is Microsoft Excel
  • OEE Software for Maximizing Operational Efficiency
    In today’s fast-paced industrial landscape, the pursuit of operational excellence is a critical factor for success. One key metric that plays a pivotal role in achieving this goal is Overall Equipment Efficiency (OEE). OEE software has emerged as a powerful tool for businesses seeking to optimize their operations and enhance productivity. In this article, we… Read more: OEE Software for Maximizing Operational Efficiency
  • Takt time Clock
    In the realm of manufacturing precision, a pivotal player emerges the Takt Time Clock. This sophisticated timekeeping instrument is more than a mere apparatus; it serves as a cornerstone for aligning production with customer demand. In this exploration, we’ll navigate the professional significance of the Takt Time Clock and its role in elevating manufacturing efficiency.… Read more: Takt time Clock
  • Takt time and Cycle time
    In the ever-evolving landscape of manufacturing, efficiency is a paramount goal. Two critical concepts that play a pivotal role in achieving this efficiency are Takt Time and Cycle Time. In this article, we will unravel the essence of Takt Time and Cycle Time, exploring their definitions, significance, and how they collaborate to streamline manufacturing processes.… Read more: Takt time and Cycle time
  • Takt time example
    Takt time is a fundamental concept in lean manufacturing that plays a crucial role in optimizing production processes. In this article, we’ll explore a practical example of takt time in the context of an automotive assembly line. Understanding how takt time operates in real-world scenarios is essential for manufacturers aiming to enhance efficiency and meet… Read more: Takt time example
  • Companies hire Industrial engineers in India
    Are you an industrial engineer looking for exciting career prospects in India? You’re in the right place! We’ve compiled a comprehensive list of companies that frequently hire industrial engineers. It’s important to note that this isn’t an exhaustive list, and there may be other companies actively seeking industrial engineering talent. We have listed 261 companies… Read more: Companies hire Industrial engineers in India
  • SOP vs Work instruction
    In the dynamic landscape of various industries, ensuring consistency, compliance, and efficiency in daily operations is paramount. Two key documents that play a crucial role in achieving these goals are Standard Operating Procedures (SOP) and Work Instructions (WI). While often used interchangeably, it’s important to recognize that SOPs and WIs serve distinct purposes in the… Read more: SOP vs Work instruction
  • Definition of waste in lean manufacturing perspective
    The definition of waste in Lean Manufacturing perspective is “any activity that does not add value to the product or service” Something deep inside of almost every person tells us that it is good to improve. It is better to move forward than it is to move backward. It is better to move faster than… Read more: Definition of waste in lean manufacturing perspective
  • AutoCAD Recovery Manager
    In today’s article we are going to talk about AutoCAD recovery manager. We all know about AutoCAD and how this software has changed our life drastically. If you want to learn more about AutoCAD you can check here. Autodesk’s AutoCAD is a household name nowadays, it is a tool which architects engineers, designer etc. use.… Read more: AutoCAD Recovery Manager
  • Operational excellence strategy
    Operational excellence strategy is a comprehensive approach that organizations employ to optimize their processes, enhance productivity, and consistently deliver high-quality products or services. Operational excellence is a multifaceted approach to improving business performance and productivity. It encompasses a wide range of principles, strategies, and practices aimed at optimizing operations, enhancing efficiency, and delivering exceptional value… Read more: Operational excellence strategy
  • 3D Printing in manufacturing
    Hey there! Have you heard about 3D printing? It’s this incredible technology that’s shaking up the world of manufacturing. Let me break it down for you. Imagine a world where you can create anything you want, from the most intricate designs to personalized medical implants, at the touch of a button. That’s the power of… Read more: 3D Printing in manufacturing
  • Industrial wastewater treatment
    Industrial wastewater treatment is the process of purifying and removing contaminants from water that has been used in industrial processes. It aims to eliminate harmful substances, such as chemicals, heavy metals, and pollutants, from the wastewater before it is discharged into the environment or returned for reuse. This treatment is essential to protect ecosystems, public… Read more: Industrial wastewater treatment
  • Takt time excel template
    Takt time excel template is a format in Microsoft excel, which can be used to calculate takt time from the inputs entered in to it. Following are the three inputs needed for calculating takt time in the excel template; Here you will be able to download the takt time excel template, by clicking the download… Read more: Takt time excel template
  • Eliminate these 6 Big Losses to improve OEE
    OEE is one of the key factors in calculating efficiency of an Automation or machines. If any factory OEE is more than 85% than the utilization of Automation or machines is very good. In this article we will learn how to increase OEE by eliminating or reducing 6 Big Losses. Following are the 6 Big… Read more: Eliminate these 6 Big Losses to improve OEE
  • How I got admission for Masters in United States
    In this article I will be sharing my complete experience and will share the steps I have taken to get admission for Masters in United States.
  • Gemba Walk
    Gemba walk can be defined as “A Regular or Scheduled walkthrough through the factory, which focuses on Elimination of Non value added activities (NVAs) in producing a product” Lean methodologies are growing very aggressively in all facilities and brands. Almost all products and services require Lean execution. To make things simple and effective. Today we… Read more: Gemba Walk
  • Factory Layout Design
    Factory layout design is the process of arranging and organizing elements of a manufacturing facility. Physical spaces, equipment, machinery, workstation, materials, and people are some of the elements of a manufacturing facility. Following points is to be considered while designing the factory layout design; In this article we are going to discuss following points related… Read more: Factory Layout Design
0 0 votes
Article Rating
Subscribe
Notify of
guest

1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
aditi wankhade
aditi wankhade
4 years ago

well done deepak