Data Types

Nerd Cafe

In Python, every value has a type. A data type defines what kind of value something is and what you can do with it. Think of it like this:

  • A string holds text.

  • An int holds whole numbers.

  • A float holds decimal numbers.

  • A list holds a collection of items.

  • And so on...

Knowing data types is critical in machine learning, because your models expect specific types of data — like numbers for training, labels as strings, or structured data like lists and arrays.

Built-in Data Types in Python (Core Types)

1. Numeric Types

  • int: Integer (whole number)

  • float: Floating point (decimal)

  • complex: Complex number

2. Text Type

  • str: String

3. Sequence Types

  • list: Ordered, changeable, allows duplicates

  • tuple: Ordered, unchangeable (immutable)

  • range: Used for loops

4. Mapping Type

  • dict: Key-value pairs

5. Set Types

  • set, frozenset

6. Boolean Type

  • bool: True or False

7. Binary Types

  • bytes, bytearray, memoryview

Practical Examples + Notes for ML

1. Numbers: int, float, complex

ML Note:

When feeding data to machine learning models:

  • Use int or float.

  • Avoid complex unless working with signal processing or advanced math.

2. String: str

Strings are used to hold text like:

  • Category labels ("spam", "ham")

  • Column names in pandas

  • File paths ("data/train.csv")

ML Note: Use label encoding or one-hot encoding to convert strings into numbers for ML models.

3. List: list

  • Lists hold ordered items.

  • Can hold mixed types, but that’s discouraged in ML input.

ML Use Case:

  • Store a row of features.

  • Hold dataset samples before converting to NumPy or pandas.

4. Tuple: tuple

  • Similar to list, but immutable (cannot be changed).

  • Often used for coordinates, fixed-size data.

ML Note: Use when you want fixed data that should not be changed (e.g., image shape (224, 224, 3)).

5. Dictionary: dict

  • Holds key-value pairs.

  • Fast lookup, widely used for configurations and mappings.

ML Use:

  • Store model settings: {"learning_rate": 0.01}

  • Mapping labels: {"cat": 0, "dog": 1}

6. Set: set

  • Unordered, no duplicates.

  • Useful to remove duplicates or check membership.

ML Tip: Use set() to find unique classes in your target column:

7. Boolean: bool

Used in:

  • Conditional statements

  • Controlling training loops

  • Evaluation (e.g., accuracy > 0.9)

ML Use:

8. Type Checking with type() and isinstance()

Tip: Always validate your data types before passing to ML models!

9. Type Casting (Conversion)

Real-world ML Example: CSV files often load data as str. You must convert to int or float:

Summary Table

Data Type
Example
ML Usage

int

5

ID, count, label

float

3.14

Feature value, weight

str

"cat"

Label, path, text

bool

True

Condition, flag

list

[1, 2]

Features, samples

tuple

(2, 3)

Shape, coordinates

dict

{"a": 1}

Configs, mappings

set

{"a", "b"}

Unique values

Example: Simple ML Data Representation

  • id: int

  • features: list of float

  • label: str

This is a common structure before converting to pandas DataFrame or NumPy array.

Keywords

data types, python data types, int, float, str, list, tuple, dict, set, bool, type conversion, type casting, isinstance, type, machine learning, data preprocessing, feature engineering, python basics, numeric types, sequence types, nerd cafe

Last updated