Skip to main content

What Is Data? (Types, Formats & Terminology)

🧠 Core Definition​

Data refers to characteristics or information, often numerical, collected through observation. Almost anything — from physical measurements to digital media — can be represented as data.


📂 Types of Data by Structure​

TypeDescriptionExamples
Structured DataData organized into predefined fields (rows + columns)Excel files, relational databases
Unstructured DataData without a predefined format or schemaText, images, audio, video, PDFs

📊 Ratio in Organizations:

  • ~80% of data is unstructured
  • ~20% is structured

🔢 Types of Data by Format​

TypeDescriptionExamples
Continuous DataNumeric values on a continuous scaleHeight, weight, temperature, time
Categorical DataFinite groupings, may or may not be orderedGender, colors, student major
Discrete DataCountable numeric valuesAge, number of items, production year
Time Series DataData indexed by time (ordered, continuous)Sensor data, stock prices, usage logs

🕒 Time Series Assumptions:

  • Time moves forward, not backward
  • Points closer in time are more closely related

📊 Terminology in Structured Data for ML​

ElementMeaningAlso Known As
ObservationA row of data (e.g., a single house)Instance, example, feature vector
FeatureA column used for predictionAttribute, X variable, predictor, input
TargetThe column we want to predictLabel, Y variable, response, dependent variable

🧩 Other Data Relationships​

  • Spatial: Location-based proximity (e.g., GPS, maps)
  • Temporal: Time-based proximity (e.g., timestamps, logs)