Nikita Pavlichenko

Senior Machine Learning Engineer

Developing cutting-edge AI applications with expertise in LLMs, Natural Language Processing, and crowdsourcing technologies. Currently building advanced code completion models at JetBrains AI.

Nikita Pavlichenko

Experience

Senior ML Engineer

JetBrainsBerlin, Germany

March 2024 - Present

  • Developing LLMs for Cloud Code Completion in JetBrains AI.
  • Lead the LLM training project, responsible for data processing, pre-training, fine-tuning, and alignment.
  • Trained models that are deployed for use in JetBrains’ high-selling IDEs, serving millions of users worldwide, and are a core feature of the JetBrains AI product.
Visit website

Projects

Mellum: JetBrains LLM For Developers

October 2024

Contributed to Mellum, JetBrains' proprietary large language model specifically built for developers. The model is designed to deeply understand code, program semantics, and development workflows to provide intelligent coding assistance.

Large Language ModelsAI EngineeringDeveloper ToolsMachine Learning

Toloka LLM Leaderboard

July 2023

Developed the Toloka LLM Leaderboard, a comprehensive benchmarking tool for evaluating open large language models through human evaluations. This platform enables reliable comparison of model performance using crowdsourced assessments.

LLM EvaluationHuggingFaceHuman EvaluationCrowdsourcing

Crowd-Kit Python Library

May 2022

Created Crowd-Kit, an open-source Python library for crowdsourced data aggregation. The library implements various data annotation consolidation methods for classification, regression, ranking, and pairwise comparison tasks, enhancing data quality for ML applications.

PythonMachine LearningCrowdsourcingData Aggregation

CrowdSpeech Dataset

July 2021

Developed the CrowdSpeech dataset, a collection of crowdsourced audio transcriptions from non-professional workers. This resource provides valuable data for training and evaluating speech recognition systems and studying annotation quality control methods.

Dataset CreationSpeech RecognitionCrowdsourcingAudio Processing

Publications

My research contributions span multiple domains including machine learning, crowdsourcing, and AI-generated content. You can find my complete publication history on Google Scholar.

Best Prompts for Text-to-Image Models and How to Find Them

SIGIR 2023

A novel approach for optimizing text prompts for text-to-image generation models using crowdsourcing techniques and evolutionary algorithms.

AIGenerative ModelsPrompting

CrowdSpeech and Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription

NeurIPS Datasets and Benchmarks 2021

A benchmark dataset for evaluating crowdsourced audio transcription methods, featuring diverse languages and recording conditions.

NLPCrowdsourcingDatasets

Spherical convolutions on molecular graphs for protein model quality assessment

Machine Learning: Science and Technology 2021

A deep learning model operating on molecular graphs (S-GCN) for protein model quality prediction that achieved state-of-the-art results on the CASP MQA challenge.

Graph MLBioinformaticsGCN

Contact

Let's Connect

I'm always open to discussing new projects, opportunities, or partnerships. Feel free to reach out through any of these channels!

Need my resume?

Download for complete details on my experience and skills.

Download Resume