Data Engineering

Hundreds of PB of data

Diverse technology stack

Python, Scala, Java and SQL

Data Quality & Governance

Great Expectations, Databricks, Data Quality Dashboards

OpenMetadata, Airflow

ETL Pipelines

Spark, Presto/Athena, Flink, Beam, Airflow, Prefect, Kafka, Kinesis

Machine Learning Engineering

SageMaker, MlFlow, Kubeflow

Data Modeling

data modeling, dimensional modeling, normal forms, wide tables, dbt

Libraries

Creation of API Client libraries, streaming data integration, DBT unit test libraries, open source contribution to DBT Athena

Sample past projects

  • Data Quality Monitoring, Reporting and alerting - Great Expectations, Grafan
  • SLA Alerting & Monitoring implementation - Airflow