Easy Steps to Move Raw Data from FTP to S3 Storage


I quit smoking 100+ hours ago.
Instead of lighting a cigarette, I shipped my first public PyPI package: extract-load-s3 – a lightweight utility to move files from FTP→ S3 with multipart upload and validation. It can handle very big files smoothly
PyPI link: https://pypi.org/project/extract-load-s3/
Why I built it
Believe me, Every AWS Data Engineer has faced this:
Connect to remote server and download raw files which needs to be copied to AWS S3 for further processing.
When file is big we have to write the file in S3 using multipart upload and additionally we also require some kind of validation if the file uploaded is same as the file present in FTP server.
Features
FTP → S3
Multipart upload for Large files
SHA256 validation
Install
pip install extract-load-s3Usage
After installing the package simply run command line arguments
extract-load-s3 \
--flow sftp_to_s3 \
--file_name "/file/path/in/ftp" \
--s3_bucket raw \ # s3 bucket name
--ssh_host 192.168.1.15 \ # host address
--ssh_user hasan \ # host username
--ssh_password 1234 \ # remote server password
--aws_endpoint_url http://localhost:4566 \ #skip if you are not using localstack
--aws_access_key_id test \ # aws access key (optional: deafult read from aws cli)
--aws_secret_access_key test # aws secret key (optional: deafult read from aws cli)This is post 001 of The Data Pipeline – where I’ll share real production code, cost sheets, and failures.
If you're building pipelines, hit follow – I ship every week.