PostgreSQL and Pandas

How to write a Pandas dataframe into PostgreSQL in Python
Data Engineering
Python
Autor:in

Jan Kirenz

Veröffentlichungsdatum

20. Dezember 2021

Geändert

22. Dezember 2023

How to write a pandas dataframe into PostgreSQL in Python

In this tutorial, you will learn how to write a pandas dataframe into your PostgreSQL database.

Prerequisites:

  • You need to have PostgreSQL installed and running on your machine. Therefore, you need to actually start and initialize PostgreSQL after you have completed the installation.

Installation of Python modules

First of all, we use Anaconda and pip to install some Python modules. Open your terminal (macOS) or your Anaconda Command Prompt (Windows) and enter:

conda create -n postgres python=3.11 pip

Activate the environment:

conda activate postgres

Let’s install some packages. Psycopg is a PostgreSQL adapter for the Python programming language and SQLAlchemy is a Python SQL toolkit:

pip install psycopg2-binary sqlalchemy ipykernel jupyter pandas

Database connection

To connect to your database, you need to start a PostgreSQL instance. Therefore, open Postgres on your machine and confirm that your database is running.

Now we import sqalchemy as sa and create an postgresql engine to make a connection to our database.

import sqlalchemy as sa

engine = sa.create_engine('postgresql://your_username:your_password@localhost:your_port/your_database')

In this engine we need to provide the following information:

Replace your_username, your_password, your_port and your_database with your own values.

Here are some standard values for a common PostgreSQL installation:

  • your_username: postgres
  • your_port: 5432 or 5433 or 5438, … (you can find your port number in the settings of your PostgreSQL database)
  • your_database: postgres

To learn more about the engine configuration, visit the documentation.

Write pandas dataframe into PostgreSQL

Let’s create some data:

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/kirenz/datasets/master/gapminder.csv")

Now we use our engine to write the pandas data into our PostgreSQL database and call the table gapminder:

df.to_sql('gapminder', engine, if_exists='replace')

Show a list of all tables in the database:

# Querying for the list of table names
with engine.connect() as connection:
    tables = connection.execute(sa.text("SELECT table_name FROM information_schema.tables WHERE table_schema='public'"))
    for table in tables:
        print(table[0])

Code to drop the table from the databse:

sql = sa.text("DROP TABLE IF EXISTS gapminder")

with engine.connect() as connection:
    connection.execute(sql)