Prerak Panwar

Software and Data Engineer dedicated to building data-driven applications that drive smarter decisions

About Me

I'm a data-driven engineer who enjoys designing scalable systems, streamlining pipelines, and uncovering insights that power intelligent applications.

My Journey

With over 4 years of experience in data engineering and analytics, I specialize in building scalable data pipelines, automating workflows, and delivering clean, insight-driven solutions. I'm passionate about turning complex data into actionable outcomes through robust systems, clean code, and effective visualizations.

Quick Facts

Danville, California, 94526 (Open to Relocate)
4+ Years of Experience
Master's in Computer and Information Science | 3.97 GPA
Code & Data Explorer

Education

My academic background and continuous learning journey in computer science and technology.

Master's

Computer and Information Science

University of Massachusetts umassd.edu πŸ”—

September 2023 - May 2025 | GPA: 3.97

Bachelor of Technology

Computer Science and Engineering

Graphic Era University geu.ac.in πŸ”—

July 2015 - May 2019 | GPA: 3.50

Work Experience

From code to insightsβ€”my story of turning ideas into real-world software and data solutions.

Data Engineer, Analyst

May 2024 - August 2024
NECWA Logo New England Coastal Wildlife Alliance
necwa.org πŸ”—
  • β€’ Applied Random Forest to impute missing values, improving data completeness by 30% and reducing manual input collection
  • β€’ Developed scalable ETL workflows in Python, cutting processing time by 30% and enabling faster analytics and reporting
  • β€’ Deployed Power BI dashboards with KPIs and drill-downs, cutting manual reporting time by 50% for over 4M records

Data Engineer

August 2020 - July 2023
  • β€’ Enhanced SQL query performance by 15% through indexing and I/O optimization, speeding up dashboards and ad-hoc analytics
  • β€’ Built fault-tolerant microservices and APIs to ensure reliable data tracking and integration across systems
  • β€’ Maintained historical data accuracy by implementing SCD logic in reporting layers, enhancing trend insights
  • β€’ Migrated to Cassandra and improved query speed by 40% through data modeling
  • β€’ Built secure, ML-ready ETL pipelines to streamline data access and boost model accuracy by 20%

Data Engineer, Analyst

June 2019 - July 2020
  • β€’ Resolved 600+ data issues with Python and SQL validation, cutting support tickets by 40%
  • β€’ Automated Power BI reports, saving 2 hours daily and enabling real-time monitoring of data quality issues
  • β€’ Managed critical data tasks in Agile via Jira, accelerating analytics delivery
  • β€’ Contributed to team meetings and project planning

Featured Projects

A collection of end-to-end projects showcasing my skills in real-time systems, machine learning, and cloud-based data engineering.

Real-Time Fraud Detection

Real-Time Fraud Detection

Real-time fraud detection system with Kafka, Python, MySQL, and XGBoost. Deployed via Docker with 1.8M transactions streamed and LLM-powered insights using LangChain + RAG.

Python MySQL Kafka Docker Machine Learning
Movie Recommendation System

Movie Recommendation System

Content-based movie recommendation system using Natural Language Processing and vectorization. Achieved 80% accuracy in matching top suggestions to user preferences.

Python NLP Vectorization Cosine similarity Machine Learning
Scalable ETL pipeline for Analytics

Scalable ETL pipeline for Analytics

End-to-end Azure ETL pipeline using Azure (ADF, ADLS, Synapse), Databricks and Spark. Reduced manual effort by 90% and enabled real-time analytics with Medallion Architecture.

Azure Services Databricks Python Spark SQL Power BI

Skills & Technologies

Technologies and tools I leverage to transform ideas into impactful, data-driven solutions and seamless user experiences.

Programming

  • SQL, NoSQL, T-SQL
  • Apache Spark, REST APIs, Java (Familiar)
  • Python (NumPy, pandas, Matplotlib, Seaborn, Scikit-learn, Flask, OOP)

Tools & Platforms

  • PostgreSQL, MySQL, Oracle
  • Databricks, Airflow, Kafka, Docker
  • Excel, Power BI, Tableau, Alteryx, dbt
  • AWS (IAM, S3, EC2, Redshift, SageMaker)
  • Azure (Data Factory, Synapse Analytics, Data Lake Storage)

Statistics & Machine Learning

  • Regression, Classification
  • NLP, Time Series Forecasting, Random Forest, XGBoost
  • Decision Trees, KNN, K-means Clustering
  • Hypothesis Testing, A/B Testing, EDA

GenAI

  • LLMs (OpenAI, Hugging Face), Transformer Architecture
  • LangChain, Retrieval-augmented generation (RAG)
  • Prompt Engineering, Vector Databases, Model Context Protocol (MCP)

Project & Workflow Management

  • Agile (Scrum), Jira, CI/CD
  • Git, GitHub

Data Management & Architecture

  • Requirements Gathering, Data Modeling
  • Source-to-Target Mapping, Data Warehousing
  • Data Quality Assurance (QA), Governance

Certifications

Verified certifications showcasing my expertise across database management, cloud engineering, and backend systems.

Oracle

Generative AI Certified Professional

Oracle

Click here to verify!
Microsoft

Azure Fundamentals

Microsoft

Click here to verify!
Microsoft

Azure Data Engineer Associate

Microsoft

Click here to verify!
MongoDB

Associate Database Administrator

MongoDB

Click here to verify!
Google

Data Analytics Specialization

Google

Click here to verify!
Python

Python Developer

Udemy

Click here to verify!

Get In Touch

I am always open to discussing new opportunities, interesting projects, or just having a friendly chat about technology.

Let's Connect