Ephesians 2:10 (ESV)

“For we are His workmanship, created in Christ Jesus for good works, which God prepared beforehand, that we should walk in them.”


Definition of Data Science


IBM says that: “Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI) and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision making and strategic planning.

Data scientists are the practitioners within the field of data science.

Data scientists may also work with data engineers, machine learning engineers, software engineers, and data analysts to produce a useful product.

Public servants in the government & business leaders both leverage data to make productive decisions with their budgets. Companies such as: Walmart, Amazon, & Netflix, rely on data to generate valuable insights. The Federal Social Security Administration has analytical models in play that improve claims processing. Law enforcement is beginning to participate in the insights data scientists & analysts bring to the table. Scientists, like those submitting work to the National Institute of Health, can choose to share data allowing for its reuse for further, additional insights.

Recommended limits to what data, data scientists can share, can be found here.


Data Science vs. Machine Learning vs. Artificial Intelligence


Within the technology field today: data scientists, machine learning engineers, & artificial intelligence engineers are in-demand.

While these titles & fields are often thought to be interchangeable (by the public), there are key differences.

Similarities:

(1) Each of these three fields builds a foundation in the collection, organization, and analysis of data.

(2) Iterative Evolution: these fields each refine their algorithms & models through constant ieration & improvement.

(3) The Pursuit of Predictive Power based on forecasted future trends (data science), informed guesses based on discerned patterns from algorithms (machine learning), & the anticipation of preferences/behaviors (artificial intelligence).

Differences:

The field of artificial intelligence extends beyond data manipulation into areas such as robotics, computer vision & natural language modeling. A.I. aims to produce machines that perform tasks. These machine-produced tasks are often meant to replace a human being from doing the same task.

Machine learning emphasizes having machines make predictions based on patterns within data & often encompasses greater software development tactics than a data scientist would typically utilize. The machine learning engineer wants to design & implement a system that allows the computer system to make it’s own decisions based on data it is fed.


IBM’s Data Life Cycle


IBM provides this information about their four-stage approach to the Data Life Cycle:

  • Stage 1: Data Ingestion
    • The lifecycle begins here with data collection.
    • The data is raw & can be structured or unstructured.
    • The collection comes from a variety of methods: manual entry, web scraping, real-time streaming data, etc. . . .
    • The sources of data can include: customer data, log files, video, audio, pictures, IoT, social media, and more. . .
  • Stage 2: Data Storage & Data Processing
    • Data is stored & structured by data management teams.
    • The different formats & structures of data collected will influence the type of storage method utilized.
      • Data warehouses, date lakes, or other repositories are utilized.
    • Cleaning data, deduplicating, transforming & combining the data using Extract/Transform/Load (ELT) are all aspects of this stage.
  • Stage 3: Data Analysis
  • Stage 4: Communicate
    • Reports & other data visualizations are utilized to help the decision-makers to understand the value of the data.

Five Stages of the Data Life Cycle


The UC Berkeley School of Information provides this information about their five-stage approach to the Data Life Cycle:

  • Stage 1: Capture
    • Data acquisition, data entry, signal reception, data extraction
  • Stage 2: Maintain
    • Data warehousing, data cleansing, data staging, data processing, data architecture
  • Stage 3: Process
    • Data mining, clustering/classification, data modeling, data summarization
  • Stage 4: Analyze
    • Exploratory/confirmatory, predictive analysis, regression, text mining, qualitative analysis
  • Stage 5: Communicate
    • Data reporting, data visualization, business intelligence, decision making

“Today, effective data scientists masterfully identify relevant questions, collect data from a multitude of different data sources, organize the information, translate results into solutions, and communicate their findings in a way that positively affects business decisions.

These skills are now required in almost all industries, which means data scientists have become increasingly valuable to companies.” – U.C. Berkeley


Those in the data science field utilize tools & languages such as:

Python, “R”, SQL, SQL-Variants: PostgreSQL, Tableau, Power BI, Bokeh, Plotly, Infogram, Excel, Apache Spark, TensorFlow, MLflow, Pytorch, RapidMiner, and Hugging Face . . . .


Abid All Awan, blogger for DataCamp, believes the Top Ten Data-Science Tools for 2024 are:

(1) Pandas

(2) Seaborn

(3) Scikit-learn

(4) Jupyter Notebooks

(5) Pytorch

(6) MLFlow

(7) Hugging Face

(8) Tableau

(9) RapidMiner

(10) ChatGPT



Cloud Computing’s Role for Data Science


Access to cloud computing power allows data scientist “additional processing power, storage, and other tools . . . .”

Scalability is vital for data sets that can change & grow large in a time-sensitive manner. The cloud can provide access to data lakes, which allow large volumes of data to be ingested & processed with ease. Additional compute nodes can be added with additional cost, for a short-term cost that provides a potential long-term payoff.


Ephesians 2:10 (ESV)

“For we are His workmanship, created in Christ Jesus for good works, which God prepared beforehand, that we should walk in them.”



Comments

2 responses to “What Is Data Science?”

  1. […] I am using Python within the field of data science. […]

  2. […] Data science is a huge field, for more information about data science generally, click here. […]

Leave a Reply

Your email address will not be published. Required fields are marked *