Nikita Pavlichenko

Senior Research Engineer

Developing cutting-edge AI applications with expertise in LLMs, Natural Language Processing, and crowdsourcing technologies. Currently building advanced code completion models at JetBrains AI.

GitHub LinkedIn Scholar Resume

Experience

Senior Research Engineer

JetBrainsBerlin, Germany

March 2024 - Present

Developing LLMs for Cloud Code Completion in JetBrains AI.
Author of Mellum LLMs.
Lead the LLM training project, responsible for data processing, pre-training, fine-tuning, and alignment.
Trained models that are deployed for use in JetBrains' high-selling IDEs, serving millions of users worldwide, and are a core feature of the JetBrains AI product.

Visit website

Projects

Mellum: JetBrains LLM For Developers

October 2024

Contributed to Mellum, JetBrains' proprietary large language model specifically built for developers. The model is designed to deeply understand code, program semantics, and development workflows to provide intelligent coding assistance.

Large Language ModelsAI EngineeringDeveloper ToolsMachine Learning

View Project

Toloka LLM Leaderboard

July 2023

Developed the Toloka LLM Leaderboard, a comprehensive benchmarking tool for evaluating open large language models through human evaluations. This platform enables reliable comparison of model performance using crowdsourced assessments.

LLM EvaluationHuggingFaceHuman EvaluationCrowdsourcing

View Project

Crowd-Kit Python Library

May 2022

Created Crowd-Kit, an open-source Python library for crowdsourced data aggregation. The library implements various data annotation consolidation methods for classification, regression, ranking, and pairwise comparison tasks, enhancing data quality for ML applications.

PythonMachine LearningCrowdsourcingData Aggregation

View Project

CrowdSpeech Dataset

July 2021

Developed the CrowdSpeech dataset, a collection of crowdsourced audio transcriptions from non-professional workers. This resource provides valuable data for training and evaluating speech recognition systems and studying annotation quality control methods.

Dataset CreationSpeech RecognitionCrowdsourcingAudio Processing

View Project

Publications

My research contributions span multiple domains including machine learning, crowdsourcing, and AI-generated content. You can find my complete publication history on Google Scholar.

Best Prompts for Text-to-Image Models and How to Find Them

SIGIR 2023

A novel approach for optimizing text prompts for text-to-image generation models using crowdsourcing techniques and evolutionary algorithms.

AIGenerative ModelsPrompting

View Publication

CrowdSpeech and Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription

NeurIPS Datasets and Benchmarks 2021

A benchmark dataset for evaluating crowdsourced audio transcription methods, featuring diverse languages and recording conditions.

NLPCrowdsourcingDatasets

View Publication

Spherical convolutions on molecular graphs for protein model quality assessment

Machine Learning: Science and Technology 2021

A deep learning model operating on molecular graphs (S-GCN) for protein model quality prediction that achieved state-of-the-art results on the CASP MQA challenge.

Graph MLBioinformaticsGCN

View Publication

View All Publications

Contact

Let's Connect

I'm always open to discussing new projects, opportunities, or partnerships. Feel free to reach out through any of these channels!

nikita.pavlichenko@gmail.com

linkedin.com/in/nikita-pavlichenko

GitHub

github.com/pilot7747

Google Scholar

View Academic Publications

Need my resume?

Download for complete details on my experience and skills.

Download Resume