Salvatore Barbagallo

Data Science & Bioinformatics
Bridging biomedical science and computational analysis through Python, R, SQL, and genomic data science

About

Experienced Biomedical Scientist transitioning into data science with 7+ years in clinical labs. I combine wet-lab expertise (flow cytometry, NGS library prep) with computational skills to analyse biological data and streamline workflows.

I work across Python, R, SQL, and Bash to turn biological data into actionable insight.

Python R SQL Bash NGS Data Analysis Tableau Git Flow Cytometry NGS Library Prep (Illumina) Statistics LIMS
Location: London, United Kingdom
Languages: Italian · English · Portuguese · Spanish
Open to: Data Science · Bioinformatics

Projects

A selection of work spanning genomics, analytics, and scripting. Full list on GitHub.

Genomic Data Science Specialization

End-to-end workflows for genomic analysis: alignment, assembly, RNA-seq, variant calling, and DB querying.

Details
  • Objective: Gain proficiency in computational genomics through applied projects spanning multiple bioinformatics subfields.
  • Approach: Completed a series of modules covering sequence alignment, genome assembly, RNA-seq analysis, variant calling, data visualisation, and database querying. Applied Python, R, and Biopython to process and analyse large genomic datasets.
  • Outcome: Built a diverse portfolio of scripts and workflows applicable to real-world genomics research, showcasing versatility across topics such as quality control, functional annotation, and statistical interpretation of biological data.
PythonRBiopythonBioconductorBowtie/Bowtie2Burrows-Wheeler Aligner (BWA)SAMtoolsBEDtoolsVCFtoolsFastQCCufflinks/CuffdiffTopHatSTARHISAT2DESeq2

Salifort Motors, Attrition Model - Google Advanced Data Analytics Project

Predictive modelling to understand drivers of employee turnover and inform retention strategy.

Details
  • Objective: Build a predictive model to understand factors influencing employee attrition
  • Approach: Cleaned and transformed HR datasets with Pandas, performed exploratory analysis, engineered features, and implemented classification models using scikit-learn.
  • Outcome: Delivered actionable insights and recommendations to improve workforce retention strategies.
Pythonscikit-learnPandasNumPySciPyStatsmodelsMatplotlibXGBoost

TikTok Engagement Analysis - Google Advanced Data Analytics Project

Exploratory analysis of engagement metrics to uncover content trends and optimisation levers.

Details
  • Objective: Analyse user engagement metrics to identify content trends on TikTok.
  • Approach: Processed raw CSV datasets, generated visualisations with Matplotlib, and applied descriptive statistics to uncover patterns in user behaviour.
  • Outcome: Produced data-driven recommendations to optimise content strategy and boost audience engagement.
PythonPandasMatplotlibSeabornPlotlySciPy

Bellabeat Case Study - Google Data Analytics Project

R + SQL workflow on wearable device data; built dashboards to surface activity & health patterns.

Details
  • Objective: Explore wearable device data to identify activity patterns and health trends.
  • Approach: Used R with dplyr and SQLite for data cleaning and summarisation; designed interactive Tableau dashboards for insight presentation.
  • Outcome: Suggested targeted product features and marketing approaches based on data-driven evidence.
  • Bellabeat Tableau Dashboard
    View interactive Tableau dashboard ↗
    Profit Margin Dashboard
    View interactive Tableau dashboard ↗
RSQLitedplyrTableau

Solution Architect - AWS Migration Design

High-level AWS architecture design for migrating on-premises workloads to a cloud-native, fully managed solution, ensuring scalability, fault-tolerance, and operational efficiency.

Details
  • Objective: Migrate two on-prem workloads - a three-tier web application and a Hadoop-based analytics environment - into a modern AWS environment with managed services.
  • Approach: Designed an end-to-end cloud solution using AWS managed services:
    • Web Application Architecture:
      • Frontend hosted on Amazon S3 with CloudFront for global HTTPS delivery.
      • Backend containerized in ECS on Fargate behind an Application Load Balancer (ALB).
      • Database migrated to Amazon Aurora MySQL (Multi-AZ) with ElastiCache Redis for caching and SQS for asynchronous decoupling.
      • AWS Secrets Manager used for credential management and CloudWatch/X-Ray for monitoring.
    • Data Analytics Architecture:
      • Replaced Hadoop with AWS EMR (Spark/Hive) for scalable distributed data processing.
      • Created an S3 Data Lake as the central repository with metadata managed by AWS Glue Data Catalog.
      • Used Athena for serverless queries, Redshift for data warehousing, and QuickSight for BI dashboards.
      • Data ingested from on-prem via AWS DataSync.
  • Integration Flow: CloudFront routes traffic to ALB → ECS → Aurora, with ElastiCache and SQS optimising backend performance. Simultaneously, DataSync moves data to S3 for analysis via Glue → EMR → Athena → Redshift → QuickSight.
  • Outcome: Achieved full migration with:
    • Decoupled, fault-tolerant architecture
    • Multi-AZ high availability
    • Cloud-native modernization of both workloads
    • Reduced operational overhead with managed services
    • Integrated observability and security across layers
AWS Migration Architecture Diagram
AWS Architecture Diagram - Migration Solution
AWS CloudFront S3 ECS Fargate Aurora MySQL ElastiCache SQS EMR Glue Athena Redshift QuickSight

Google Fiber - Business Intelligence Capstone

Final capstone for the Google Business Intelligence Certificate, showcasing stakeholder-driven dashboard design and data storytelling using Tableau.

Details
  • Objective: Build a BI solution that communicates market performance and insights for Google Fiber's leadership team.
  • Approach: Merged multiple regional CSV datasets directly in Tableau, cleaned and standardised columns, and created interactive dashboards visualising KPIs such as revenue, margin, and customer trends.
  • Outcome: Delivered a clear, visually consistent dashboard highlighting business performance across regions and channels.
Google Fiber Tableau Dashboard
View interactive Tableau dashboard ↗
Tableau Data Visualization Business Intelligence Google Fiber

Experience

Specialist Biomedical Scientist - UCL Hospitals · Stem Cell Laboratory

Sep 2021 - Present · London, UK
  • Processed and cryopreserved PBSCs, bone marrow, DLI and CD34+ enrichment for transplant procedures.
  • Maintained Grade A clean room sterility via protocols and agar plate assessments.
  • Analysed CD3+/CD34+ populations via MACSQuant; delivered critical treatment data.
  • Supported 11 active clinical trials, including CAR-T (Yescarta, Novartis), antibody, and genetic therapies.
  • Built stock-management & training tracking tools; reduced inventory errors by ~30% and saved ~10 hrs/week.

Laboratory Scientist - CooperGenomics

Jul 2019 - Sep 2021 · London, UK
  • Processed embryo samples for PGT-A / PGT-SR / PGT-M, WGA and gel electrophoresis.
  • Conducted NGS library preparation (manual: 96 samples/run; automated: 192 samples/run) with strict QC adherence for Whole Genome Sequencing.
  • Maintained Mosquito HV & Dragonfly Discovery liquid handlers; optimised run reliability.
  • Analysed NGS run metrics and generated patient reports (>500/week).
  • Authored and reviewed SOPs across the workflow.

Biomedical Laboratory Assistant - Cytology - Leicester Royal Infirmary

Dec 2018 - Jun 2019 · Leicester, UK
  • Managed sample reception and prepared specimens for Papanicolaou staining.
  • Maintained reagents and ensured sample integrity end-to-end.

Education

  • MSc, Bioinformatics - Atlantic Technological University (Sep 2025 - Present)
  • MSc, Cell & Gene Therapy - University College London (2021-2023)
    Dissertation: Expansion and Preservation of Haematopoietic Potential in Human Amniotic Fluid Stem Cells for Therapeutic Applications
  • BSc, Biomedical Science - University of Catania (2014-2017)
    Dissertation: Cytotoxicity assays using SIRC, ARPE-19, and HRPE cells

Earlier Research Internships

Research internships completed during the BSc in Biomedical Science at the University of Catania, building a foundation in genomics, molecular diagnostics, and analytical biochemistry.

Pharmacology - Torre Biologica · Dept. of Biomedical & Biotechnological Sciences (Oct 2016 - Sep 2017)
  • Performed cytotoxicity assays using ATP-lite on SIRC, ARPE-19, and HRPE cell lines.
  • Designed and analysed quantitative experiments forming the basis of an undergraduate thesis.
Oncology & Experimental Haematology - Dept. of Clinical & Experimental Medicine (May 2016 - Jul 2016)
  • Applied PCR, electrophoresis, western blot, chemiluminescence, and NGS for diagnostic research.
  • Processed fluorescence-labelled DNA samples for gene expression and mutation studies.
Medical Genetics - University Hospital of Catania (Jan 2015 - Apr 2015)
  • DNA extraction and quantification from peripheral blood using spectrophotometric analysis.
  • CGH-SNP Array analysis for chromosomal abnormalities and variant detection.
  • Clinical data submission and coordination with medical genetics team.
Additional Rotations - University of Catania (2014 - 2017)
  • Short placements across Forensic Toxicology, Pathology, Immunohematology, and Public Health.
  • Hands-on exposure to GC-MS analysis, immunohistochemistry, FISH, and analytical chemistry techniques.

Certificates

  • Google: Data Analytics; Advanced Data Analytics; IT Automation with Python; Project Management; Business Intelligence
  • Google Cloud: Architecting with Google Kubernetes Engine
  • Amazon Web Services (AWS): Cloud Practitioner Essentials; Cloud Solutions Architect
  • Johns Hopkins University: Genomic Data Science Specialization
  • Coursera: Access Bioinformatics Databases with Biopython
  • Wellcome: Bioinformatics for Biologists: An Introduction to Linux, Bash Scripting, and R; Analysing and Interpreting Genomics Datasets
  • freeCodeCamp: Data Analysis with Python; Relational Databases; Scientific Computing with Python
  • DE<code>LIFE: Genomes, Networks & Pathways; Data Science & Machine Learning with Python
  • Le Wagon: Data Visualization with Tableau

Contact