Machine learning systems are not built from Jupyter notebooks, for the most part of it they are much complex systems.
In my self-learning journey, I wanted to learn about putting different systems and technologies together. So, I used a simple, not to complicated problem — car price prediction- to understand how it done in a small scale.
This project covers everything from data scraping to deployment. Here’s a quick overview:
🔍 Data Scraping
- Scrapy: Developed a web scraper to extract car data and saved it as a CSV.
- PostgreSQL: Hosted the scraped data on a local PostgreSQL database.
- GCP VM Instance: Deployed the scraper using ScrapeOps, scheduled to run daily at midnight.
- Data Curation: Halted scraping upon reaching a sufficient number of entries.
🤖 Machine Learning
- Imported data from PostgreSQL for local storage.
- Conducted data analysis and preprocessing.
- Built, evaluated, and fine-tuned a machine learning model.
- Saved the model as a pickle file for easy reuse.
🌐 FastAPI
- Created an API for processing new data and making predictions.
- Hosted the API on port 8000 for easy access.
🎨 Gradio
- Designed a simple UI for visualization purposes.
🐳 Docker
- Dockerized the API and Gradio app into a single container.
- Pushed the container to Docker Hub for easy distribution.
🔄 GitHub Actions
- Configured CI/CD to build and push Docker images on PR to the main branch, ensuring seamless updates.
I’ve detailed the entire process, insights, and key takeaways on my medium post:
https://medium.com/@chidubemndukwe/ml-from-data-scraping-to-deployment-fa7ddc5fab5c
Here is a quick project demo:
Stay nerdy