Salvatore Barbagallo

Bioinformatics & Data Science
Data scientist with 7+ years in clinical and genomics labs, now applying Python, R, SQL, and Bash to NGS and clinical data.

About

I have 7+ years of experience across clinical, stem-cell, and genomics laboratories, and now focus on bioinformatics and data science. I bridge GMP-grade wet-lab workflows (flow cytometry, NGS library preparation, sample tracking) with computational analysis so that biological and clinical data are reproducible, auditable, and actually useful.

I work with Python, R, SQL, and Bash to build analysis notebooks, small pipelines, and automation scripts for NGS data, QC, and reporting.

Python R SQL Bash Bioinformatics NGS Data Analysis NGS Library Prep (Illumina) Flow Cytometry Statistics Tableau Git LIMS
Location: London, United Kingdom
Languages: Italian · English · Portuguese · Spanish
Open to: Bioinformatics & Data Science roles · Right to work in UK & EU

Projects

A selection of work spanning genomics, analytics, and scripting. Full list on GitHub.

RNA-seq Pipeline (Nextflow + Docker)

Containerised RNA-seq workflow: FastQC → Cutadapt → STAR alignment → featureCounts → MultiQC → DESeq2.

Details
  • Objective: Build a reproducible, portable RNA-seq pipeline runnable on any machine with Docker.
  • Approach: Implemented the workflow in Nextflow and executed with Docker to isolate dependencies; produces a MultiQC report for end-to-end QC review.
  • Outcome: One-command execution; standardised outputs for QC and quantification suitable for downstream DE analysis.
Nextflow RNA-seq pipeline: successful Docker test run (Docker profile)
Proof run: Nextflow test execution (Docker profile)
BioinformaticsWorkflow EngineeringNextflowDockerRNA-seq FastQCMultiQCSTARfeatureCountsDESeq2 CutadaptDSL2

M. tuberculosis WGS Variant Analysis Workflow (Galaxy)

Galaxy-based whole-genome sequencing workflow for Mycobacterium tuberculosis, covering QC, trimming, alignment, coverage assessment, variant calling, annotation, and IGV-supported interpretation of resistance-associated loci.

Details
  • Objective: Build and run an end-to-end WGS workflow to identify and interpret resistance-associated variants in M. tuberculosis isolates.
  • Approach: Processed short-read sequencing data using fasterq-dump, Falco, fastp, Cutadapt, BWA-MEM2, Picard SortSam, Picard MarkDuplicates, SAMtools flagstat, SAMtools idxstats, mosdepth, bcftools mpileup, bcftools call, bcftools filter, bcftools csq, SnpEff, SnpSift, MultiQC, and IGV within a Galaxy workflow for QC, trimming, alignment, BAM processing, coverage analysis, variant calling, filtering, annotation, and visual validation in IGV using Galaxy and standard genomics tools.
  • Outcome: Produced QC and mapping summaries, variant tables, resistance interpretation summaries, and IGV validation images for clinically relevant loci.
IGV validation of atypical katG codon 295 variation in Mycobacterium tuberculosis
IGV validation example from the M. tuberculosis WGS workflow
Bioinformatics WGS Galaxy fasterq-dump Falco fastp Cutadapt BWA-MEM2 Picard SAMtools mosdepth bcftools SnpEff SnpSift MultiQC IGV

Salifort Motors - Employee Attrition Prediction (Classification Model)

Predictive modelling to understand drivers of employee turnover and inform retention strategy.

Details
  • Objective: Build a predictive model to understand factors influencing employee attrition
  • Approach: Cleaned and transformed HR datasets with Pandas, performed exploratory analysis, engineered features, and implemented classification models using scikit-learn.
  • Outcome: Delivered actionable insights and recommendations to improve workforce retention strategies.
Data Science Machine Learning Google Capstone Python scikit-learn Pandas NumPy SciPy Statsmodels Matplotlib XGBoost

TikTok - Social Media Engagement Analysis (EDA & Visualisation)

Exploratory analysis of engagement metrics to uncover content trends and optimisation levers.

Details
  • Objective: Analyse user engagement metrics to identify content trends on TikTok.
  • Approach: Processed raw CSV datasets, generated visualisations with Matplotlib, and applied descriptive statistics to uncover patterns in user behaviour.
  • Outcome: Produced data-driven recommendations to optimise content strategy and boost audience engagement.
Data AnalyticsExploratory Data AnalysisGoogle CapstonePythonPandasMatplotlibSeabornPlotlySciPy

Bellabeat - Wearable Health Data Analysis

R + SQL workflow on wearable device data; built dashboards to surface activity & health patterns.

Details
  • Objective: Explore wearable device data to identify activity patterns and health trends.
  • Approach: Used R with dplyr and SQLite for data cleaning and summarisation; designed interactive Tableau dashboards for insight presentation.
  • Outcome: Suggested targeted product features and marketing approaches based on data-driven evidence.
Bellabeat Tableau Dashboard
View interactive Tableau dashboard ↗
Profit Margin Dashboard
View interactive Tableau dashboard ↗
Data AnalyticsBusiness IntelligenceGoogle CapstoneRSQLdplyrTableau

AWS Managed Services - Cloud Migration Architecture

High-level AWS architecture design for migrating on-premises workloads to a cloud-native, fully managed solution, ensuring scalability, fault-tolerance, and operational efficiency.

Details
  • Objective: Migrate two on-prem workloads - a three-tier web application and a Hadoop-based analytics environment - into a modern AWS environment with managed services.
  • Approach: Designed an end-to-end cloud solution using AWS managed services:
    • Web Application Architecture:
      • Frontend hosted on Amazon S3 with CloudFront for global HTTPS delivery.
      • Backend containerized in ECS on Fargate behind an Application Load Balancer (ALB).
      • Database migrated to Amazon Aurora MySQL (Multi-AZ) with ElastiCache Redis for caching and SQS for asynchronous decoupling.
      • AWS Secrets Manager used for credential management and CloudWatch/X-Ray for monitoring.
    • Data Analytics Architecture:
      • Replaced Hadoop with AWS EMR (Spark/Hive) for scalable distributed data processing.
      • Created an S3 Data Lake as the central repository with metadata managed by AWS Glue Data Catalog.
      • Used Athena for serverless queries, Redshift for data warehousing, and QuickSight for BI dashboards.
      • Data ingested from on-prem via AWS DataSync.
  • Integration Flow: CloudFront routes traffic to ALB → ECS → Aurora, with ElastiCache and SQS optimising backend performance. Simultaneously, DataSync moves data to S3 for analysis via Glue → EMR → Athena → Redshift → QuickSight.
  • Outcome:
    • Decoupled, fault-tolerant architecture
    • Multi-AZ high availability
    • Cloud-native modernization of both workloads
    • Reduced operational overhead with managed services
    • Integrated observability and security across layers
AWS Migration Architecture Diagram
AWS Architecture Diagram - Migration Solution
Cloud Architecture Architecture Design AWS Data Infrastructure CloudFront S3 ECS Fargate Aurora MySQL ElastiCache SQS EMR Glue Athena Redshift QuickSight

Google Fiber - Telecom KPI Dashboard & SQL Analytics

Stakeholder-driven KPI dashboard in Tableau, combining regional datasets and turning them into an executive-ready performance view.

Details
  • Objective: Build a BI solution that communicates market performance and insights for Google Fiber's leadership team.
  • Approach: Merged multiple regional CSV datasets directly in Tableau, cleaned and standardised columns, and created interactive dashboards visualising KPIs such as revenue, margin, and customer trends.
  • Outcome: Delivered a clear, visually consistent dashboard highlighting business performance across regions and channels.
Google Fiber Tableau Dashboard
View interactive Tableau dashboard ↗
Business Intelligence Dashboarding Google BI Capstone Tableau Data Visualization KPI Design SQL

Experience

Specialist Biomedical Scientist - UCL Hospitals · Stem Cell Laboratory

Sep 2021 - Dec 2025 · London, UK
  • Processed and cryopreserved PBSCs, bone marrow, DLI and CD34+ enrichment for transplant procedures.
  • Maintained Grade A clean room sterility via protocols and agar plate assessments.
  • Analysed CD3+/CD34+ populations via MACSQuant; delivered critical treatment data.
  • Supported 11 active clinical trials, including CAR-T (Yescarta, Novartis), antibody, and genetic therapies.
  • Built stock-management & training tracking tools; reduced inventory errors by ~30% and saved ~10 hrs/week.

Laboratory Scientist - CooperGenomics

Jul 2019 - Sep 2021 · London, UK
  • Processed embryo samples for PGT-A / PGT-SR / PGT-M, WGA and gel electrophoresis.
  • Conducted NGS library preparation (manual: 96 samples/run; automated: 192 samples/run) with strict QC adherence for Whole Genome Sequencing.
  • Maintained Mosquito HV & Dragonfly Discovery liquid handlers; optimised run reliability.
  • Analysed NGS run metrics and generated patient reports (>500/week).
  • Authored and reviewed SOPs across the workflow.

Biomedical Laboratory Assistant - Cytology - Leicester Royal Infirmary

Dec 2018 - Jun 2019 · Leicester, UK
  • Managed sample reception and prepared specimens for Papanicolaou staining.
  • Maintained reagents and ensured sample integrity end-to-end.

Education

  • MSc, Bioinformatics - Atlantic Technological University (Sep 2025 - Present)
  • MSc, Cell & Gene Therapy - University College London (2021-2023)
    Dissertation: Expansion and Preservation of Haematopoietic Potential in Human Amniotic Fluid Stem Cells for Therapeutic Applications
  • BSc, Biomedical Science - University of Catania (2014-2017)
    Dissertation: Cytotoxicity assays using SIRC, ARPE-19, and HRPE cells

Earlier Research Internships

Research internships completed during the BSc in Biomedical Science at the University of Catania, building a foundation in genomics, molecular diagnostics, and analytical biochemistry.

Pharmacology - Torre Biologica · Dept. of Biomedical & Biotechnological Sciences (Oct 2016 - Sep 2017)
  • Performed cytotoxicity assays using ATP-lite on SIRC, ARPE-19, and HRPE cell lines.
  • Designed and analysed quantitative experiments forming the basis of an undergraduate thesis.
Oncology & Experimental Haematology - Dept. of Clinical & Experimental Medicine (May 2016 - Jul 2016)
  • Applied PCR, electrophoresis, western blot, chemiluminescence, and NGS for diagnostic research.
  • Processed fluorescence-labelled DNA samples for gene expression and mutation studies.
Medical Genetics - University Hospital of Catania (Jan 2015 - Apr 2015)
  • DNA extraction and quantification from peripheral blood using spectrophotometric analysis.
  • CGH-SNP Array analysis for chromosomal abnormalities and variant detection.
  • Clinical data submission and coordination with medical genetics team.
Additional Rotations - University of Catania (2014 - 2017)
  • Short placements across Forensic Toxicology, Pathology, Immunohematology, and Public Health.
  • Hands-on exposure to GC-MS analysis, immunohistochemistry, FISH, and analytical chemistry techniques.

Certificates

  • IBM: Data Engineering (in progress)
  • Google: Data Analytics; Advanced Data Analytics; IT Automation with Python; Project Management; Business Intelligence
  • Google Cloud: Architecting with Google Kubernetes Engine
  • Amazon Web Services (AWS): Cloud Practitioner Essentials; Cloud Solutions Architect
  • Johns Hopkins University: Genomic Data Science Specialization
  • SAS: SAS Programming 1: Essentials; SAS Programming 2: Data Manipulation Techniques
  • Coursera: Access Bioinformatics Databases with Biopython
  • Wellcome: Bioinformatics for Biologists: An Introduction to Linux, Bash Scripting, and R; Analysing and Interpreting Genomics Datasets
  • freeCodeCamp: Data Analysis with Python; Relational Databases; Scientific Computing with Python
  • DE<code>LIFE: Genomes, Networks & Pathways; Data Science & Machine Learning with Python
  • Le Wagon: Data Visualization with Tableau

Contact