HELLO, I AM MADHAV KUMAR

Passionate
Data Professional

With 10 years of overall experience, I specialize in designing, developing, and executing massive data pipelines, data lakes, and scalable ingestion systems on the Azure Cloud Platform.

Data Engineer Profile

Professional Summary

Building Scalable Data Ecosystems

My approach blends a deep understanding of Big Data technologies with modern cloud architecture. Whether it's Snowflake optimization, Apache Spark processing, or automated CI/CD data pipelines, I bridge the gap between raw data and actionable business intelligence.

I am a results-driven Data Engineer with 10 years of expertise and 5 focused years designing and implementing scalable data ingestion pipelines using Azure Data Factory. Over the years, I've successfully executed data lake requirements in numerous large companies using the Big Data Technology stack (Python, Spark, Hadoop, Hive).

I am proficient in leveraging Azure Databricks and Spark for distributed processing, and adept at designing cloud-based data warehouse solutions using Snowflake on Azure. I work collaboratively with stakeholders to implement logical and physical data models, ensuring performance, scalability, and data integrity.

Snowflake Expert

Deep expertise in Multi-Cluster, Time Travel, cloning and performance optimization.

Big Data & Spark

Strong track record optimizing Spark jobs and distributed processing pipelines.

Streaming & Real-time

Real-time data architecture using Kafka and Spark Streaming.

CI/CD DevOps

Automating robust data pipeline deployments in Azure DevOps.

Work Experience

Azure Snowflake Data Engineer @ Walmart

Aug 2022 – Present

  • Designed and implemented scalable data ingestion pipelines using Azure Data Factory, ingesting data from SQL, CSV, and REST APIs.
  • Developed data processing workflows using Azure Databricks, leveraging Spark for distributed transformation tasks.
  • Designed a cloud-based data warehouse solution using Snowflake, creating schemas, tables, and views for efficient data retrieval.
  • Implemented partitioning, indexing, and caching strategies in Snowflake to reduce query latency.
  • Implemented real-time data processing solutions using Kafka and Spark Streaming for high-volume streaming data.
  • Developed automated CI/CD framework for data pipelines using Jenkins and Azure DevOps.
Azure Databricks ADF Snowflake Kafka Spark SQL

Azure Snowflake Data Engineer @ State Farm

Oct 2020 – Jul 2022 | Dallas, TX

  • Implemented end-to-end data pipelines using Azure Data Factory to extract, transform, and load (ETL) data into Snowflake.
  • Leveraged Azure Data Lake Storage logic for storing raw data, with robust partitioning and retention strategies.
  • Integrated Azure Logic Apps for orchestrating complex workflows and triggers.
  • Implemented advanced analytics/ML workflows using Azure Machine Learning and Snowflake.
  • Designed data archiving and retention strategies using Azure Blob Storage and Snowflake's Time Travel feature.
  • Optimized Spark configurations, caching, and data partitioning in Azure Databricks.
Azure Logic Apps Snowflake Time Travel PySpark Azure Purview

Big Data Developer @ Aetna Inc.

July 2019 – Sep 2020 | Hartford, CT

  • Maintained data pipelines using Sqoop, Flume, and Kafka to ingest and process customer behavioral data.
  • Performed data aggregation on large-scale datasets using Apache Spark, Scala, and Hive.
  • Integrated HBase with Hive on the Analytics Zone, optimizing tables for efficient queries.
  • Migrated data from RDBMS (Oracle) to Hadoop using Sqoop for processing.
  • Implemented automation for deployments using YAML scripts for faster releases.
Hadoop Scala Sqoop HBASE Kafka

Big Data Developer @ Anthem

April 2018 – June 2019 | Chicago, IL

  • Prepared an ETL framework using Sqoop, Pig, and Hive to bring data from various sources.
  • Developed Spark Streaming applications for real-time sales analytics.
  • Utilized Spark-Cassandra Connector APIs for data migration and reporting.
  • Extensively worked on creating combiners, partitioning, and distributed cache to enhance MapReduce job performance.
MapReduce Hive Cassandra Spark Streaming

Data Warehouse Developer @ Mayo Clinic

May 2015 – Mar 2018 | Rochester, MN

  • Designed ETL data flows using SSIS, to extract and migrate data from SQL Server, Access, and Excel.
  • Efficient in Dimensional Data Modeling for Data Mart design, developing fact & dimension tables with SCDs.
  • Built Cubes and Dimensions with different Architectures using SSAS for Business Intelligence and MDX scripting.
  • Expertise in developing Parameterized, Chart, Dashboard, and Scorecard reports natively via SSRS.
MS SQL Server SSIS SSAS SSRS

Data Warehouse Developer @ Allstate Insurance

Nov 2013 – Apr 2015 | Chicago, IL

  • Used Data warehouse for developing Data Mart feeding downstream reports in Power BI.
  • Deployed SSIS Packages and created SQL Agent jobs for efficient package execution.
  • Developed stored procedures and triggers to facilitate consistent data entry.
  • Shared data outside using Snowflake to quickly set up data sharing without complex pipelines.
Power BI C# SQL Profiler SharePoint

Technical Skills

Azure & Cloud Services

Azure Data Factory
Azure Databricks
Snowflake
Logic Apps
Function App
Azure DevOps
Azure Synapse

Big Data Technologies

MapReduce
Hive & Pig
PySpark & SparkSQL
Kafka
Spark Streaming
Oozie & Sqoop
Hortonworks / Cloudera

Languages & Databases

Python
Scala
SQL / PL-SQL
MS SQL Server
Oracle 11g/12c
Cosmos DB

ETL & Architecture

ETL Pipelines
SSIS / SSAS / SSRS
Data Warehousing
Dimensional Modeling
Data Marts
Change Data Capture

Licenses & Certifications

Amazon Web Services (AWS)

AWS Certified Cloud Practitioner

Amazon Web Services (AWS)

Introduction to Generative AI

  • Generative AI
Sololearn

JavaScript

  • JavaScript
Sololearn

SQL

  • SQL, MySQL, Query language
Udemy

Build Responsive Real-World Websites with HTML and CSS

  • HTML5, CSS, Set up
  • JavaScript
Sololearn

HTML5 Application Development Fundamentals

  • HTML5
Udemy

Python Object Oriented Programming

  • Python, OOP
Udemy

Java Database Connection: JDBC & MySQL

  • JDBC
  • MySQL
CITI Program

Research Investigators and Key Personnel

  • Communication
CITI Program

Responsible Conduct of Research

  • All Disciplines

Academic Background

Master of Science - MS, Computer Science

Oregon State University

Sep 2023 - Mar 2025 | CGPA: 3.86

Skills: Algorithms, Machine Learning, Database Management (DBMS), Data Science

Bachelor of Technology - BTech, Computer Science

Amrita Vishwa Vidyapeetham

Jul 2017 - Jun 2021 | CGPA: 7.68

Skills: Data Structures, Operating Systems, Algorithms, Big Data Analytics

High School Diploma (XI - XII)

Sasi Junior College, Velivennu

Jun 2015 - Jul 2017

High School Diploma (I - X)

St. Ann's E.M School, Rajahmundry

Jun 2002 - May 2015