Multi-Document Summarization and Ranking Optimisation

Engineered a SOTA multi-document summarization model for news aggregation, achieving an 87% human acceptance rate with less than 1% critical errors.
Designed automated news timelines for evolving events, resulting in a 2.8% increase in content depth and a 1.4% boost in user time spent.
Optimized ranking algorithms with a CTR prediction model, increasing daily active users by 4.3% and user engagement by 2.1%.
Created a novel similarity scoring formula, improving F1-score from 0.91 to 0.95 for news clustering.

Andrey worked on this case as the ML engineer at Yandex Zen.

ML engineer

Natural Language Processing

Recommendation Systems

Entertainment

Media

Global

Enterprise

News & Content Platform

Web App

Python

Pandas

Hydra

TensorFlow

Scipy

Docker

Tensorboard

Wandb

B2C

Andrey

Machine Learning Engineer at Yandex Zen

Andrey's cases

Natural Language Processing

Global

Research and Development

LLM Pretraining Pipeline

Data Pipeline

Python

Large-Scale Text Corpus Deduplication and Dataset Enhancement

Developed and deployed text corpus deduplication using a suffix array algorithm on MapReduce, boosting assessor F1-score from 0.77 to 0.82.
Trained a classifier to improve benchmark coverage, enhancing dataset relevance for pretraining tasks.
Created a dataset augmentation pipeline with Back Translation, increasing pretraining robustness and generalization.
Enhanced document parsing quality, resulting in 3% faster model convergence and improved resource efficiency.

ML engineerYandex GPT

Language Translation Systems

Telecom

Global

Research and Development

C to Eolang Compiler

Compiler

C++

C to Eolang Compiler Development

Implemented processing of all basic data types from Clang AST to equivalent constructions in Eolang.
Developed a mechanism for translating multidimensional arrays to Eolang.
Developed a mechanism for translating Enum types to Eolang.

Product Search and Logistics Automation

Developed a search robot for products, increasing cold client conversions by 4%.
Created a report monitoring system for item positions for clients.
Developed an advanced route-planning algorithm for courier logistics, increasing daily pickup points by 13%.
Added generation of invoices for product stock, improving product acceptance process speed and accuracy by 15%.

ML engineerWBprod

Similar cases

Natural Language Processing

Entertainment

Global

Enterprise

News & Content Platform

Web App

Python

Multi-Document Summarization and Ranking Optimisation

Engineered a SOTA multi-document summarization model for news aggregation, achieving an 87% human acceptance rate with less than 1% critical errors.
Designed automated news timelines for evolving events, resulting in a 2.8% increase in content depth and a 1.4% boost in user time spent.
Optimized ranking algorithms with a CTR prediction model, increasing daily active users by 4.3% and user engagement by 2.1%.
Created a novel similarity scoring formula, improving F1-score from 0.91 to 0.95 for news clustering.

ML engineerYandex Zen

Natural Language Processing

Global

Research and Development

LLM Pretraining Pipeline

Data Pipeline

Python

Large-Scale Text Corpus Deduplication and Dataset Enhancement

Developed and deployed text corpus deduplication using a suffix array algorithm on MapReduce, boosting assessor F1-score from 0.77 to 0.82.
Trained a classifier to improve benchmark coverage, enhancing dataset relevance for pretraining tasks.
Created a dataset augmentation pipeline with Back Translation, increasing pretraining robustness and generalization.
Enhanced document parsing quality, resulting in 3% faster model convergence and improved resource efficiency.

ML engineerYandex GPT

Natural Language Processing

Media

Global

Startup

Video Generation Platform

Web App

Python

NLP Pipeline for Video Generation

Developed NLP pipeline for generating videos from URLs.
Implemented a conversation quality metric using Python and FastAPI.
Deployed solutions via Docker on private Amazon servers.
Developed an internal user interface dashboard in JavaScript.
Fine-tuned GPT-neo for conditional text generation.

Senior Machine Learning EngineerOxolo

Recommendation Systems

AI/ML Solutions for Business Optimization

Developed AI/ML solutions to help businesses optimize processes and improve efficiency.
Implemented fraud detection systems to predict anomalies in transactions.
Developed demand forecasting models to optimize inventory and supply chains.
Created customer churn prediction models to analyze data and retain users.
Built recommendation systems to personalize content and products.
Conducted sentiment analysis using NLP to detect sentiments.

Data ScientistPehard Studios

Payment Processing System

Web App

Python

Payment API Integration and Automation

Integrated a payment API client with a Telegram chat-bot, enabling 500 users to initiate over 300 payments daily.
Developed backend Django ORM logic to automatically generate and persist more than 10,000 payment records annually.
Implemented a cron-based scheduler for payment status updates, achieving synchronization latency under 2 seconds.
Ensured 99.9% accuracy in payment status tracking, reducing payment dispute cases by 40%.

Python DeveloperPeoplePro Tech