Lifeng Jin graduated with a PhD in 2020, and is currently working at Scale AI. Lifeng agreed to talk about what he’s been up to and give advice for current students.
What kind of work do you do?
I am a Machine Learning Research Scientist at Scale AI. My job involves two main parts: one part is to discuss with the external researchers from big labs about their needs for data pipelines; the other part is to conduct research on various topics related to how to improve data quality, diversity etc.
What are your responsibilities and current projects?
My responsibilities include coming up with proposals and design documents for novel data collection pipelines to support novel large language model (LLM) applications such as speech-to-speech, targeted reinforcement learning from human feedback (RLHF) and so on. Other responsibilities include looking at data demands across the industry, and coming up with foresight into what novel data types may be demanded in the future, and what problems there are currently and how to address them.
One research project I am involved in currently is to try to detect artifacts in human-created datasets. Such artifacts can be introduced by some arbitrary decisions in the beginning of a data collection pipeline, and are very harmful to model training.
What do you like about this work?
It is interesting because it is core at the current LLM revolution. Data is definitely driving a lot of the improvements we see in all the commercial LLMs and has gradually become the bottleneck.
How did you get into the line of work?
I have been doing NLP related research since I graduated, so this is consistent with what I have been doing.
What are the transferable skills that you learned in your graduate program that you are using in your current career?
Most of the research related skills are very useful since my work now is closely related to what I was doing in the graduate program. Doing research, finding novel problems, paper writing skills, presentations and communication, critical thinking, hypothesis testing through experiments, quantitative analysis and so on are all very useful.
I think it is important to start preparing early. Publish papers and practice technical skills - they are important for landing a good job!