DataScience Basics
In this page, we can learn below topics
- Data Science Introduction
- Flow of this blog
- Data Science Applications
- Data Science Fields
- Data Science- The entire process
1.Data Science Introduction:
As the world is emerging with Data, everyone wants to play with Data for some useful cause. Getting some usueful information from high volume of Data, has become the trend. This trend later named as Data Science. Now in the official words “Data Science is a scientific method, process, systems to extract knowledge ,insights from data.This Data can be either structured or unstructured. Data Science = Maths + Statistics
2. Flow of this blog
Below is the flow of this course.
- Statistics
- Analytics
- Machine Learning (R & Python)
- Visualization Tool (Tableau)
- Databases
3.Data Science Applications:
Below are some of the Data Science Applications. And in Future its going to be everwhere, beyond our imagination.
- Fraud and Risk Detection.
- Healthcare.
- Internet Search.
- Targeted Advertising.
- Website Recommendations.
- Advanced Image Recognition.
- Speech Recognition.
- Airline Route Planning.
4.Data Science Fields:
Below are few fields in Data Science
- Image Processing
- Video Processing
- Medicine Simulation
- Material Simulation
- Speech Analysis
- Text Analysis(Natural Language Processing)
5.Data Science- The entire process
Below is the work flow of a Data Scientist
Understand the Business Problem or Business Objective
The problem or Business Objective should be understood in the right way. To understand the exact problem, a Data Scientist asks more questions(why’s),or conduct interviews or make more questionaires with client. Understand the problem right, will solve 50% of the issue. The rest will come under analysis of the problem and prediction of right solution.
Data Gathering
After the problem identifiction or right identification of Business Objective, Data Scientist try the core to get all related records to proceed further. The relevant Data can be DB logs, Online Data entires, or Web logs,structured or unstructured data, any form of relevant data will be gathered for analysis.
Data Preparation
Data preparation is nothing but cleaning the Data which involves some below steps. After Data Gathering, we do below steps to prepare the data for further Analysis.
- Remove unwanted Observations
- Fix Structural Errors
- Filter Unwanted Outliers
- Handle Missing & Duplicate Data
Data Transformation
Convert Data from one format to another required format is Data Transformation. We have tools like Talend and Informatica to help in this stage.
Data Analysis
Deep Analysis of Data is taken place in this section. In this stage selection of feature variable is defined and refined. Statistics and Analytics help us in these stages to understand the Data better. We can learn statistical techniques in upcoming posts.
Data Modeling
In this stage, selection of best fit Machine Learning algorithm is identified and the selected alogrithm is applied on Data to identify whether it is the best suit for Business.
Visulization and Communication
In this stage, powerful and dashing reports will be created using Tools like Tableaue and Power BI and Qlick View. This reports are shown to customer to explain them how the Business Problem will be solved with the best fit ML model.
Deploy and Test and Maintain ML Model
The final stage involvs Deploy and Test ML Model and Maintains the Model.
This is how the Business Problem is solved by a Data Scientist. Upcoming posts will be explaining all these steps in detail.
Next post about
- Statistics- Descriptive and Inferential
- Mean Median Mode
- Correlation Covariance, Collinearity