Basheer Becerra's Personal Site

Thank you for visiting my personal website. To view my CV, please click here or view my one-page resume here.

Personal website last updated: 12/5/2021

Hello World!


My name is Basheer Becerra (pronounced as BUH-sheer beh-SEH-ruh).

I am currently a third year PhD candidate in Bioinformatics & Integrative Genomics (BIG) attending Harvard Medical School. I graduated from Illinois State University with a double major in computer science and statistics and a biology minor.

My thesis work (with co-advisors Dr. Luca Pinello and Dr. Daniel Bauer) involves leveraging emerging CRISPR-Cas genome editing tools such as base-editors to perform screens to understand the function coding and non-coding sequences and SNPs. As induction of fetal hemoglobin serves as an opportunity for therapeutic intervention of beta-hemoglobinopathy, these screens aim to identify regulatory elements of fetal hemoglobin. This work is a part of the NHGRI IGVF initiative. From this work, I specifically:
- Perform and develop generalizable screening approaches in the lab for characterizing non-coding and coding elements.
- Develop upstream and downstream computational methods for screen design and analysis, respectively.

Professional Summary

General summary of professional/research experiences and goals

G3 Bioinformatics Ph.D. student with experimental and computational skills specifically in genome editing screens, statistics and machine-learning, and software engineering.

Specifically, I am interested in developing screening approaches and computational methods used to study biological mechanisms (i.e. regulatory mechanisms of non-coding elements) and identify therapeutic targets (i.e. induction of fetal hemoglobin for beta-hemoglobinopathies).


Ganz, J., Maury, E.A., Becerra, B., Bizzotto, S., Doan, R.N., Kenny, C.J., Shin, T., Kim, J., Zhou, Z., Ligon, K.L. and Lee, E.A., 2021. Rates and patterns of clonal oncogenic mutations in the normal human brain. Cancer Discovery.

Rube, H.T., Rastogi, C., Feng, S., Kribelbauer, J.F., Li, A., Becerra, B., Melo, L.A., Do, B.V., Li, X., Adam, H.H. and Shah, N.H., 2021. Probing molecular specificity with deep sequencing and biophysically interpretable machine learning. bioRxiv.



PhD Rotation Period at Harvard Medical School September 2019 – June 2021
Rotation Student with Dr. Eliezer Van Allen at Dana Farber Cancer Institute
Variant calling using bulk whole exome sequences, bulk RNA-seq, and single-cell RNA-seq of prostate cancer patient samples to determine presence of clonal hematopoiesis of indeterminate potential (CHIP) somatic mutations.
Rotation Student with Dr. Christopher Walsh at Boston Children’s Hospital
Performed somatic variant calling of normal GTEx brain RNA-seq samples to determine association of somatic mutational burden with age. Exploration of allele-specific expression of GTEx bulk RNA-seq brain samples to distinguish somatic and germline variants. Integration of four published fetal brain single-cell RNA-seq datasets to call differentially expressed and co-expressed gene markers potentially identifying functionally important long non-coding RNAs.
Rotation Student with Dr. Peter J. Park at Harvard Medical School
Calling and validating copy-number variants in single-cell neurons amplified from the Primary Template-directed Amplification (PTA) protocol to understand the extent of copy-number burden in neurotypical individuals.
Rotation Student with Dr. Luca Pinello at Massachusetts General Hospital
Developed semi-supervised graph-based model to predict cell-fate based on lineage barcoding information from single-cell LARRY-barcode data. Additionally, explore transcriptomic effects of CRISPR perturbations of putative enhancers related to fetal hemoglobin switching.

Data Scientist at Advance Trading Inc. August 2018 – May 2019
Hired as the first lead data-scientist to support company transformation to data analytics and IT modernization. Firstly, provide consultation for data engineering solutions for recording, storing, and analyzing large amounts of commodity trading data in Google Cloud Platform. Secondly, develop the first operationalized and internally deployed machine-learning model for commodity basis price prediction leveraging company data using advanced time-series and multi-task recurrent neural network models. This position demonstrates technical skills and problem solving in data engineering and machine-learning along with communication and collaborative skills between business and IT

Computational Genomics Intern at Vertex Pharmaceuticals May 2019 - August 2019
Determined the transcriptomic effects of Vertex pharmaceutical drug candidates for alpha1-antitrypsin deficiency in mouse models – collaborative effort between the computation genomics team and the alpha1-antitrypsin deficiency biology team.

Biomedical Informatics Graduate Teaching Fellow - Fall 2020; Fall 2021
Teaching fellow for two courses with around 40 students: “BMI713 Computing Skills for Biomedical Sciences” which introduces data analysis using the R programming language, and “BMI715 Computational Statistics for Biomedical Sciences” which introduces probability theory and statistical hypothesis testing. Responsible for helping prepare lecture and assignment material as well as conducting recitation lessons and office hours.

Research Software Engineer at Columbia University December 2018 – Present
Full-stack software engineering to provide a comprehensive portal to store and view detailed sequence-to-affinity models of SELEX-seq datasets (i.e. transcription factor motifs) to the research community. Front-end visualization (Angular 2+), back-end logic for access-control and orchestration (Java Spring Framework), interaction with data repositories (PostgreSQL, Amazon S3, Redis, OAuth, etc.), deployment on cloud platforms (Amazon Web Services). Manuscript submitted; in-review (Rube et al., bioRxiv 2021).

Quantitative Biology Researcher/Consultant at Cold Spring Harbor Laboratory June 2018 – December 2018
Develop computational pipeline for analysis of tumor copy-number aberrations of patient-derived pancreatic cancer (PDAC) organoids exome sequences and breast tumor multi-regional exome sequences to predict drug response and patient prognosis.

IT Systems Intern - May 2015 – March 2018
- Optimization and debugging of internal software for claims representatives saving around $70,000/yr of time due to increased software efficiency and reduction of software crashes.
- Development of internal Angular 2+ UI modules for calling image-recognition machine-learning models for claims-handling as well as mobile front-end interaction with back-end web services.
- Independently developed two highly-consumed web services both deployed to production using the Java Spring Framework: a RESTful web service that conducts CRUD operations for claims file management, and a SOAP web service that dynamically generates “campaign IDs” to track original source of web page visitors for marketing analytics.
- Contributed to front-end development of State Farm’s main customer-facing claims handling interface.
- Produced 20-page research paper on the machine-learning capabilities of cloud platforms (such as AWS, GCP, Azure) in collaboration with Google vendors. Presented the results to several key stakeholders and directors to aid in cloud enablement and decision-making of cloud platform selection.
- One of the winners of the internally-held State Farm Hackathon (with over 150 teams) on an application for leveraging internal analytics for social media automation; sponsored by the Chief Marketing Executive and worked with marketing directors and AVPs to plan development of the application.
- Awarded the 2017 State Farm Innovator and Q3 2017 Special Achievement Award as an intern.