3 min read

Easy Steps to Move Raw Data from FTP to S3 Storage

Nasrul Hasan
Nasrul Hasan
Nasrul Hasan
Cover Image for Easy Steps to Move Raw Data from FTP to S3 Storage

I quit smoking 100+ hours ago.

Instead of lighting a cigarette, I shipped my first public PyPI package: extract-load-s3 – a lightweight utility to move files from FTP→ S3 with multipart upload and validation. It can handle very big files smoothly

PyPI link: https://pypi.org/project/extract-load-s3/

Why I built it

Believe me, Every AWS Data Engineer has faced this:

  • Connect to remote server and download raw files which needs to be copied to AWS S3 for further processing.

  • When file is big we have to write the file in S3 using multipart upload and additionally we also require some kind of validation if the file uploaded is same as the file present in FTP server.

Features

  • FTP → S3

  • Multipart upload for Large files

  • SHA256 validation

Install

 pip install extract-load-s3

Usage

After installing the package simply run command line arguments

 extract-load-s3 \                  
    --flow sftp_to_s3 \
    --file_name "/file/path/in/ftp" \
    --s3_bucket raw \ # s3 bucket name
    --ssh_host 192.168.1.15 \ # host address
    --ssh_user hasan \ # host username
    --ssh_password 1234 \ # remote server password
    --aws_endpoint_url http://localhost:4566 \ #skip if you are not using localstack
    --aws_access_key_id test \ # aws access key (optional: deafult read from aws cli)
    --aws_secret_access_key test # aws secret key (optional: deafult read from aws cli)

This is post 001 of The Data Pipeline – where I’ll share real production code, cost sheets, and failures.

If you're building pipelines, hit follow – I ship every week.