Creating SQL Insert Statements with Python Faker

Having a reliable data set to test is crucial when working with data analysis or developing database systems. However, acquiring accurate data can often be challenging due to privacy concerns and availability issues. This is where the Python library comes into play. Faker allows you to generate realistic fake data, which can be used for testing and analysis purposes. In this article, we’ll explore how to use Faker to create SQL insert statements for customer information for a hypothetical business analysis.

Use Case

You need to create a dataset of customer information that includes personal details, contact information, and creation timestamps. This data will simulate a database system that tracks customer interactions, sales transactions, and inventory management.

Setting Up

First, ensure you have Faker installed. You can install it via pip if you haven’t already:

pip install faker

Imports

We’ll also use the datetime module for handling date and time operations, so let’s import faker and datetime modules to our script:

from faker import Faker
from datetime import datetime

Initializing Faker

Create an instance of the Faker class:

# Initialize Faker 
fake = Faker()

For this example, let’s generate data for customers located in various cities. Here’s a list of cities we will use:

# Provided list of cities: 
wanted_cities = [ "Berwyn", "Chicago", "Chicago Heights", "Elgin", "Elmhurst"]

Generating Random Data

Random Timestamps

To generate a random timestamp within a specified date range, we can use the following function:

# Function to generate a random timestamp within a given date range
def random_date(start, end):
    return fake.date_time_between(start_date=start, end_date=end).strftime('%Y-%m-%d %H:%M:%S')

Random Phone Numbers

We can create a custom function to generate phone numbers in the desired format:

# Function to generate a random phone number in the format 111-111-1111
def random_phone_number():
    return str(fake.random_number(digits=3)) + '-' + str(fake.random_number(digits=3)) + '-' + str(fake.random_number(digits=4))

Generating Customer Data

Set the date range for customer creation and update timestamps:

# Date range
start_date = datetime(2024, 1, 1)
end_date = datetime(2024, 3, 31)

Now, let’s generate 10 random customer insert statements:

# Generate 10 random customer insert statements
customer_inserts = []
for _ in range(10):
    first_name = fake.first_name()
    last_name = fake.last_name()
    email = fake.email()
    phone = random_phone_number()  # Use the custom phone number generation function
    address = fake.street_address()
    city = fake.random_element(elements=wanted_cities)  # Randomly select a city from the list of Illinois cities
    state = 'IL'  # Set state to IL
    zip_code = fake.zipcode_in_state(state_abbr='IL')
    created_at = random_date(start_date, end_date)
    
    insert_statement = f"INSERT INTO customer (first_name, last_name, email, phone, address, city, state, zip_code, created_at, updated_at) VALUES ('{first_name}', '{last_name}', '{email}', '{phone}', '{address}', '{city}', '{state}', '{zip_code}', '{created_at}');"
    customer_inserts.append(insert_statement)

To verify our generated data, let’s display the first 5 insert statements:

# Display the first 5 generated insert statements for verification
customer_inserts[:5]

Conclusion

Using Faker in Python, you can quickly generate realistic fake data for various applications. Whether you need customer data, product data, or any other type of dataset, Faker provides a simple and flexible way to create it. Incorporating fake data into your testing and analysis workflows ensures that your systems are robust and ready for real-world use.

Leave a comment