Data Scientist · ML & Predictive Analytics · HealthTech

Bridging Data,

Biology, and

Intelligence.

Most enterprise data is broken. Healthcare data is the hardest version of that problem — regulated, fragmented, high-stakes, and unforgiving.

I spent 6 years solving it. That makes everything else a simpler version of the same challenge.

0h
weekly hours automated
0%
ML model accuracy
0+
telemedicine users
0+
years scaling 0→1
Scroll
01 / 07
01 / About

The Algorithm Behind the Work

Problem

The hardest data in the world

Clinical records. Patient rights filings. National insurance claims. Data that is regulated, fragmented, incomplete — and where a wrong model costs lives, not just money.

Tension

Most data scientists never face this

Fintech has clean APIs. E-commerce has structured logs. Healthcare has none of that. 6 years in this environment forced a level of rigor — data quality, regulatory constraint, stakeholder pressure — that most never encounter.

Resolution

If it works here, it works anywhere

That's the edge. I bring healthcare-grade data discipline to every domain. Fintech, insurtech, e-commerce, logistics — the problems are simpler. The methods are the same.

Background

Started as a Biomedical Engineer — built medical devices, then moved into clinical operations. Spent 4 years in healthcare leadership before formalizing everything with an MSc in Data Science (EAFIT, 2024).

That path is the differentiator. Not just a data scientist who studied healthcare — a practitioner who lived it as a department director, auditor, and forensic analyst, then brought rigorous ML on top.

“Healthcare is my MBA in hard data. Six years solving the hardest case makes everything else a simpler version of the same problem.”

Education

MSc Data Science & Analytics

Universidad EAFIT

2023–2024

Specialist — Analytics & Big Data

IU Digital de Antioquia

2023

Specialist — Health Services Administration

Universidad de Antioquia

2022

Specialist — Biomedical Engineering

Universidad Pontificia Bolivariana

2017–2019

BSc Biomedical Engineering

Institución Universitaria ITM

2010–2016

Technical Arsenal

LanguagesPython
6 yrs95%
LanguagesSQL
6 yrs93%
BIPower BI
5 yrs91%
MLscikit-learn / XGBoost
5 yrs90%
Data EngAirflow / ETL
4 yrs88%
BITableau
4 yrs84%
CloudGCP / BigQuery
3 yrs83%
Data Engdbt
3 yrs80%
AILangChain / AI Agents
2 yrs78%
MLTensorFlow / PyTorch
3 yrs76%
Data EngSpark / Kafka
2 yrs75%
DevOpsDocker / Kubernetes
3 yrs74%
MLNLP / SHAP
2 yrs73%
LanguagesR
4 yrs72%
02 / Experience

Where the Work Happened

6+ years delivering data products in high-stakes, regulated environments — from rural public hospitals to national-scale health insurers.

Jan 2025 – Present

Business Intelligence Analyst

Savia Salud

National Health Insurer · 1M+ members · Colombia

40h/week automated
  • //Integrated 4 heterogeneous regulated sources — national claims records, beneficiary registries, patient rights filings, clinical data — into a unified cloud data warehouse
  • //Eliminated 100% of manual Excel reporting, freeing 40h/week of analyst capacity
  • //Built geospatial predictive demand models at 85% accuracy, reshaping resource allocation for high-risk zones
  • //Delivered Power BI dashboards with custom HTML/CSS visuals to 50+ C-suite and board-level executives
May 2023 – Dec 2025

Senior Science & Technology Specialist

TECNALIA Research & Innovation / MinCiencias

National Science Ministry · R&D Supervision · Colombia

20+ projects audited
  • //Technical oversight of 20+ national health research projects with multi-million dollar public investment
  • //Detected 30% of critical technical deviations in early project phases, preventing cost overruns
  • //Validated ML model reproducibility and scientific deliverable quality for MinCiencias compliance
Aug 2024 – Jan 2025

Freelance Data Scientist & BI Developer

Upwork

Freelance — Global Clients · Remote

  • //Built BI dashboards in Power BI, Tableau, and Looker Studio for clients across North America and Europe
  • //Automated workflows and API integrations using n8n for e-commerce and SaaS clients
  • //Digital marketing analytics: attribution modeling, funnel analysis, Meta Ads performance reporting
Aug 2020 – May 2022

Chief Scientific Officer & Data Lead

Hospital Santa Margarita

Rural Public Hospital · 250+ beds · Colombia

−35% ER wait times
  • //Cut emergency wait times 35% using Python/SQL time-series demand modeling and staff reallocation
  • //Built geospatial telemedicine routing for 2,000+ rural patients, reducing in-person costs 40%
  • //Deployed 12 Power BI dashboards tracking 15 real-time clinical KPIs for hospital leadership
  • //Forensic data analysis on 90+ medical liability cases — 90% favorable resolution for the institution
Jun 2022 – Apr 2023

Data Science & Health Analytics Specialist

Metrosalud

Municipal Health Network · Medellín, Colombia

  • //Data scientist embedded in a public child health program (Buen Comienzo) serving 10,000+ families
  • //Designed data quality pipelines and reporting for sensitive population datasets
03 / Projects

Problems Solved

Every project starts with a broken process. Here's what I found, what I built, and what changed.

HealthTech · Business Intelligence
40h/wksaved

BI Ejecutivo — Savia Salud EPS

Problem

50+ executives at a 1M+ member health insurer wasted 40h/week manually assembling Excel reports from 4 incompatible regulated sources (RIPS, BDUA, tutelas, clinical records).

Action

Designed a cloud data warehouse integrating all 4 sources with automated ETL pipelines in Python/Airflow. Built geospatial predictive models for demand forecasting. Delivered Power BI dashboards with custom HTML/CSS visuals and live refresh — zero manual intervention.

Result

Eliminated 100% of manual Excel reports. Critical reporting cycle dropped from 3 days to 4 hours. Geospatial models hit 85% accuracy, reshaping resource allocation for high-risk zones.

Stack

PythonAirflowPower BISQLGCPBigQuery
InsurTech · NLP · Machine Learning
1,841claims analyzed

NLP + Interpretable ML — ARL Insurance Claims

Problem

1,841 temporary disability complaints (Jan–Jun 2025) from a Colombian ARL insurer sat unanalyzed. Manual review was slow, inconsistent, and couldn't surface systemic patterns across claim types.

Action

Built an end-to-end pipeline: NLP text classification of complaint narratives, interpretable ML models (SHAP explainability), an interactive Streamlit dashboard for claims analysts, and a full automation pipeline for future ingestion.

Result

Surfaced hidden complaint clusters and risk patterns invisible to manual review. Delivered an analyst-facing dashboard and reproducible ML pipeline ready for production deployment.

Stack

PythonNLPSHAPscikit-learnStreamlitPandas
HealthTech · Predictive Analytics
−35%ER wait times

Emergency Ops Optimization — Hospital Santa Margarita

Problem

A rural hospital's ER faced severe overcrowding. No data existed on demand patterns, staff allocation was reactive, and patients waited 2–3x longer than benchmarks. Medical liability cases piled up.

Action

Built time-series demand models in Python/SQL to predict ER load by hour and day. Deployed a geospatial telemedicine routing system to redirect low-acuity patients. Set up 12 Power BI dashboards tracking 15 KPIs in real time. Ran forensic data analysis for 90+ medical liability cases.

Result

Emergency wait times cut by 35%. Telemedicine covered 2,000+ rural users, reducing in-person costs by 40%. 90% favorable resolution rate on medical liability cases through data-driven forensic analysis.

Stack

PythonSQLPower BIGeospatialTime Series
Computer Vision · Deep Learning
4-chUV vision model

Computer Vision — Tetrachromatic Bird Recognition

Problem

Standard bird recognition systems fail to capture how birds actually perceive color — they see in 4 channels (UV included), not 3. Existing datasets and models ignored this entirely.

Action

Trained a multi-class CNN + Vision Transformer (BEiT) pipeline on GCP (PyTorch + OpenCV) that classifies species and simulates tetrachromatic UV vision. HDBSCAN clustering with UMAP projections for unsupervised species grouping. Deployed as a Streamlit app with interactive spectral heatmaps.

Result

First open-source model combining bird classification with tetrachromatic vision simulation. Validated improvements in clustering metrics (Silhouette Score, Calinski-Harabasz). Published as a reproducible ML pipeline.

Stack

PyTorchBEiTOpenCVHDBSCANUMAPStreamlitGCP
Data Engineering · MLOps
5production pipelines

Data Engineering Portfolio — 5 Production Pipelines

Problem

Demonstrated end-to-end data engineering competency across 5 real-world domains: unified data lakes, YouTube trends, social sentiment, security SIEM logs, and open traffic data.

Action

Built each pipeline with production-grade tooling: Airflow orchestration, Spark for distributed processing, Kafka for real-time streaming, dbt for transformations, Delta Lake for storage, and BERT/Isolation Forest for ML layers. All containerized with Docker Compose.

Result

5 end-to-end pipelines covering batch + streaming + ML inference, deployable via Docker. Covers the full modern data stack from ingestion to Grafana/Kibana dashboards.

Stack

AirflowSparkKafkadbtDelta LakeDockerPython
AI Agents · Bioinformatics
NLP→BLAST queries

SeqScope — AI Agent for Genomic BLAST

Problem

Querying NCBI BLAST for genomic sequence analysis required deep bioinformatics knowledge and repetitive manual queries — a barrier for researchers without programming backgrounds.

Action

Built an autonomous AI agent with LangChain + OpenAI API that translates natural language into BLAST queries, parses results, and returns structured insights. Deployed on Streamlit for zero-install access.

Result

Researchers query genomic databases in plain English. Live demo running on Streamlit — no CLI, no code, no bioinformatics background required.

Stack

PythonLangChainOpenAI APIStreamlitBioinformatics
04 / Services

Architectural Offerings

End-to-end data products — from raw pipeline to deployed model to executive dashboard.

Computer Vision

Custom vision models for object detection, classification, and medical imaging. From prototype to GCP-deployed API.

BI Dashboard Development

Executive-grade dashboards in Power BI, Tableau, and Looker Studio. Automated pipelines that replace manual reporting.

Cloud Model Deployment

End-to-end ML deployment on GCP and AWS. FastAPI microservices, Docker containers, scalable inference APIs.

ETL / ELT Automation

Production pipelines with Airflow, dbt, and Python. Transform messy, heterogeneous data into clean analytical layers.

AI Agent Development

Autonomous agents with LangChain for document analysis, database queries, and workflow automation using n8n.

Team Training & Mentoring

Workshops on Python, ML pipelines, and data culture. Hands-on training tailored to your team's current skill level.

05 / Contact

Initiate Synthesis

Open to consulting engagements, full-time roles, and complex data architecture challenges. Let's create something meaningful.