Data Types
Nerd Cafe
In Python, every value has a type. A data type defines what kind of value something is and what you can do with it. Think of it like this:
A string holds text.
An int holds whole numbers.
A float holds decimal numbers.
A list holds a collection of items.
And so on...
Knowing data types is critical in machine learning, because your models expect specific types of data — like numbers for training, labels as strings, or structured data like lists and arrays.
Built-in Data Types in Python (Core Types)
1. Numeric Types
int
: Integer (whole number)float
: Floating point (decimal)complex
: Complex number
2. Text Type
str
: String
3. Sequence Types
list
: Ordered, changeable, allows duplicatestuple
: Ordered, unchangeable (immutable)range
: Used for loops
4. Mapping Type
dict
: Key-value pairs
5. Set Types
set
,frozenset
6. Boolean Type
bool
: True or False
7. Binary Types
bytes
,bytearray
,memoryview
Practical Examples + Notes for ML
1. Numbers: int
, float
, complex
int
, float
, complex
ML Note:
When feeding data to machine learning models:
Use int or float.
Avoid
complex
unless working with signal processing or advanced math.
2. String: str
str
Strings are used to hold text like:
Category labels (
"spam"
,"ham"
)Column names in pandas
File paths (
"data/train.csv"
)
ML Note: Use label encoding or one-hot encoding to convert strings into numbers for ML models.
3. List: list
list
Lists hold ordered items.
Can hold mixed types, but that’s discouraged in ML input.
ML Use Case:
Store a row of features.
Hold dataset samples before converting to NumPy or pandas.
4. Tuple: tuple
tuple
Similar to list, but immutable (cannot be changed).
Often used for coordinates, fixed-size data.
ML Note: Use when you want fixed data that should not be changed (e.g., image shape (224, 224, 3)
).
5. Dictionary: dict
dict
Holds key-value pairs.
Fast lookup, widely used for configurations and mappings.
ML Use:
Store model settings:
{"learning_rate": 0.01}
Mapping labels:
{"cat": 0, "dog": 1}
6. Set: set
set
Unordered, no duplicates.
Useful to remove duplicates or check membership.
ML Tip: Use set()
to find unique classes in your target column:
7. Boolean: bool
bool
Used in:
Conditional statements
Controlling training loops
Evaluation (e.g., accuracy > 0.9)
ML Use:
8. Type Checking with type()
and isinstance()
type()
and isinstance()
Tip: Always validate your data types before passing to ML models!
9. Type Casting (Conversion)
Real-world ML Example: CSV files often load data as str
. You must convert to int
or float
:
Summary Table
int
5
ID, count, label
float
3.14
Feature value, weight
str
"cat"
Label, path, text
bool
True
Condition, flag
list
[1, 2]
Features, samples
tuple
(2, 3)
Shape, coordinates
dict
{"a": 1}
Configs, mappings
set
{"a", "b"}
Unique values
Example: Simple ML Data Representation
id
:int
features
:list
offloat
label
:str
This is a common structure before converting to pandas DataFrame or NumPy array.
Keywords
data types
, python data types
, int
, float
, str
, list
, tuple
, dict
, set
, bool
, type conversion
, type casting
, isinstance
, type
, machine learning
, data preprocessing
, feature engineering
, python basics
, numeric types
, sequence types
, nerd cafe
Last updated