Job Information
Dana-Farber Cancer Institute Scientific Data Engineer in Boston, Massachusetts
Located in Boston and the surrounding communities, Dana-Farber Cancer Institute brings together world-renowned clinicians, innovative researchers and dedicated professionals, allies in the common mission of conquering cancer, HIV/AIDS and related diseases. Combining extremely talented people with the best technologies in a genuinely positive environment, we provide compassionate and comprehensive care to patients of all ages; we conduct research that advances treatment; we educate tomorrow's physician/researchers; we reach out to underserved members of our community; and we work with amazing partners, including other Harvard Medical School-affiliated hospitals.
POSITION SUMMARY
The Targeted Protein Degradation (TPD) proteomics core at Dana-Farber Cancer Institute is a research core focused on the development and application of state-of-the-art mass spectrometry-based discovery pipelines for targeted protein degradation and molecular glues. The TPD proteomics core is closely affiliated with the Fischer lab and the chemical biology program and serves Dana-Farber researchers and discovery partnerships.
The Scientific Data Engineer works closely with the proteomics group to apply computational algorithms to integrate and analyze proteomics data with other large scale biological profiling data sources. This position involves the development of an integrated data framework, data visualization tools and an interactive web-based user interface for efficiently navigating large amounts of cellular proteomics profiles under various perturbations including chemical perturbation to unveil insights regarding biomolecular interactions.
PRIMARY DUTIES AND RESPONSIBILITIES
Deploys and maintains commercial and open-source software for efficient data processing of high throughput chemo-proteomics experiments
Collaborates with cross-functional teams to design and implement high throughput proteomics data analysis algorithms
Creates customized analysis and visualization tools to streamline proteomics data integration with high throughput biochemical and cellular screenings
Develops and maintains interactive web visualization interfaces for these databases.
Develops and maintains databases for the storage and efficient retrieval of research data, ensuring the highest levels of data integrity.
Works with lab personnel to implement automated processing solutions for repeated manual tasks.
Rigorously follows best practices for software development
Engages in continuous learning to stay updated with the latest advancements in artificial intelligence for small molecule drug discovery. KNOWLEDGE, SKILLS, AND ABILITIES REQUIRED
Demonstrated experience developing documented ETL pipelines
Knowledge of professional software engineering practices, including coding standards, code reviews, source control management, build processes, testing, and devops
Able to deploy data pipelines and infrastructure for on-premise and/or cloud-native infrastructure
Strong organizational skills with the ability to prioritize and manage various tasks and projects reliably and in a timely manner
Requires minimal direction from leadership and possesses the ability to adapt to new challenges as they arise
Excellent interpersonal skills, passionate about innovative solutions
Self-motivated with the ability to produce clear documentation and generate results with a clear and professional presentation
Detail-oriented with excellent communication skills
Able to identify, communicate, and advocate for best practices
Can work independently and efficiently to meet aggressive timelines where needed
Bachelor’s degree in data engineering, computer science, biomedical informatics, health services research, epidemiology, biostatistics, public health, or a subject area with a strong focus on the management and research use of clinical data. A Master’s degree may substitute for 2 years of experience.
4 years of hands-on programming and data analysis experience is required, of which at least 2 years must involve hands-on experience in multi-omics data analysis and database management with expertise in common bioinformatics algorithms, Python, and shell scripting.
Solid understanding of database management, server-client program development, and web visualization technologies. Strong skills in SQL on a database such as Snowflake, SQL Server, Oracle, etc.
Proficiency with Python, R, Scala, JavaScript or another modern programming language used in data engineering
Strong skills in setting up and maintaining relational databases.
Understanding of small molecule protein interactions in biological systems
Excellent teamwork and communication skills.
Dana-Farber Cancer Institute is an equal opportunity employer and affirms the right of every qualified applicant to receive consideration for employment without regard to race, color, religion, sex, gender identity or expression, national origin, sexual orientation, genetic information, disability, age, ancestry, military service, protected veteran status, or other characteristics protected by law.
EEOC Poster
Dana-Farber Cancer Institute
- Dana-Farber Cancer Institute Jobs