What Is Data? (Types, Formats & Terminology)
🧠Core Definition​
Data refers to characteristics or information, often numerical, collected through observation. Almost anything — from physical measurements to digital media — can be represented as data.
📂 Types of Data by Structure​
Type | Description | Examples |
---|---|---|
Structured Data | Data organized into predefined fields (rows + columns) | Excel files, relational databases |
Unstructured Data | Data without a predefined format or schema | Text, images, audio, video, PDFs |
📊 Ratio in Organizations:
- ~80% of data is unstructured
- ~20% is structured
🔢 Types of Data by Format​
Type | Description | Examples |
---|---|---|
Continuous Data | Numeric values on a continuous scale | Height, weight, temperature, time |
Categorical Data | Finite groupings, may or may not be ordered | Gender, colors, student major |
Discrete Data | Countable numeric values | Age, number of items, production year |
Time Series Data | Data indexed by time (ordered, continuous) | Sensor data, stock prices, usage logs |
🕒 Time Series Assumptions:
- Time moves forward, not backward
- Points closer in time are more closely related
📊 Terminology in Structured Data for ML​
Element | Meaning | Also Known As |
---|---|---|
Observation | A row of data (e.g., a single house) | Instance, example, feature vector |
Feature | A column used for prediction | Attribute, X variable, predictor, input |
Target | The column we want to predict | Label, Y variable, response, dependent variable |
🧩 Other Data Relationships​
- Spatial: Location-based proximity (e.g., GPS, maps)
- Temporal: Time-based proximity (e.g., timestamps, logs)