Resume
About Me
I am a dedicated AI Developer with 2+ years of industry and 3+ years of academic experience in machine learning, deep learning, NLP, and computer vision. My expertise includes speech-to-text (STT), large language models (LLMs), DeepFake detection, and AI-generated content classification. I have successfully built and deployed real-time AI models for voice authentication, phishing detection, and AI-driven audio classification.
I specialize in model quantization (ONNX, TensorFlow Lite, TensorRT), optimization, and scalable AI solutions for mobile and server deployments. My work focuses on multilingual speech recognition, AI-based security systems, and real-time inference acceleration, ensuring efficient and high-performance AI applications.
Skills Summary
- Programming Languages: Python, C/C++, Java
- Database Management: MySQL, PostgreSQL, PySpark
- ML & AI Frameworks: PyTorch, TensorFlow, HuggingFace Transformers, Scikit-learn, PyTorch Lightning
- Speech & NLP: Whisper, KoBERT, Transformers, Speech-to-Text (STT), Large Language Models (LLMs)
- MLOps & Optimization: Docker, Kubernetes, MLFlow, FastAPI, TorchServe, TensorRT, ONNX, TensorFlow Lite
- Development Tools: Git/GitHub, CI/CD, Docker-compose
- Cloud & Deployment: AWS EC2, GCP, Edge AI, Developed AI Models for Mobile Applications
Main Competencies
- Computer Vision & Image Processing: Object Detection, Object Tracking, OCR, Image Restoration & Enhancement, Medical Imaging
- Speech & NLP: STT, Natural Language Processing, Large Language Models (LLMs), Vision-Language Models, Generative AI
- AI Model Development & Optimization: Model Quantization, Real-time AI Systems, DeepFake Detection, Clustering, Re-Identification
- End-to-End AI Solutions: Building Scalable AI Pipelines, Deployment Pipelines, GCP & Cloud AI Model Deployment
Work Experience
AI Developer
Museblossome | Nov 2024 - Present
- DeepVoice – Real-time Voice Phishing Detection
- Achieved 98% accuracy by integrating Speech-to-Text (STT) and a fine-tuned KoBERT model for phishing detection.
- Optimized inference speed by processing audio in chunks for efficient streaming.
- Deployed the model on Android using ONNX and TensorFlow Lite.
- Project implementaion
- DeepVoiceGuard – AI vs Human Voice Classification
- Created a model that identifies AI-generated voices in phone conversations.
- Collected and processed ASVspoof2019 dataset of real and AI-generated voices.
- Achieved 95.8% accuracy and deployed on huggingface and local servers via FastAPI.
- Project implementaion
- Fine-Tuning Whisper for Korean and Uzbek Speech Recognition
- Fine-tuned Whisper-medium model for Uzbek speech-to-text using a fully custom dataset collected and curated specifically for the Uzbek language.
- Achieved a Word Error Rate (WER) of 6.48% on the evaluation set, demonstrating high accuracy for real-world Uzbek speech scenarios.
- Built a Korean speech dataset combining voice phishing, phone conversations, and AI Hub data for domain-specific STT.
- Achieved a WER of 9.17% on Korean test data, emphasizing the model’s effectiveness in voice security contexts.
- AI-Generated vs Real Music Classification
- Curated a dataset of 1M+ samples across 10 classes for music classification.
- Developed a custom AI model to distinguish real vs AI-generated music.
- Applied quantization (ONNX, TensorFlow Lite) for high-performance mobile inference.
AI Research Engineer
Aria Studios Co. Ltd | Jun 2023 - Nov 2024
- Real-time Live Portrait Optimization:
- Optimized the Live_Portrait model for real-time performance using webcam and monitor setups, achieving seamless and responsive operation.
- This project has gained significant recognition on Git-hub, receiving a high number of stars and positive feedback from the community.
- Technologies: Real-time Image Processing, Webcam&Monitor Integration, Model Optimization, Python.
- Project implementation
- Image Enhancement & Deep-fake Creation for Broadcast:
- Enhanced image quality and restored facial features to improve the realism of Deep-fake videos.
- Produced high-quality Deep-fake videos for KBS election coverage, showcasing the potential of advanced ML techniques in media.
- Technologies: Image Enhancement, Face Restoration, Deep-fake Generation, Python,GANs, Open-CV.
- Project implementation
- Multimodal User Interaction System:
- Created an integrated system combining gaze tracking, emotion estimation, and audio-to-text conversion to enhance user interaction.
- Enabled real-time adaptive responses for entertainment applications, significantly improving user engagement.
- Technologies: Gaze Tracking, Emotion Estimation, Audio Processing, Python, Machine Learning.
- Interactive Hyundai Car Models:
- Developed a model pipeline using the IP-Adapter model to generate interactive 3D car models from grayscale images and user-provided text prompts.
- Enabled users to design and visualize both classic and futuristic car models in real-time, preserving the target logos and aesthetics.
- Technologies: 3D Modeling, Image-to-3D Conversion, User Interaction, Python, TensorFlow.
- 3D Scene Creation for Interactive Films:
- Implemented 3D Gaussian splatting to create detailed and lifelike 3D reconstructions from point cloud data.
- Optimized 3D rendering processes to enhance the realism of environments and characters in interactive films.
- Technologies: 3D Reconstruction, Gaussian Splatting, Point Cloud Processing, Open-CV, PyTorch.
- Facial Performance Transfer System:
- Developed an advanced system for AI avatars to deliver multilingual speech with highly realistic facial expressions and lip synchronization.
- Leveraged state-of-the-art deep learning and real-time video processing techniques, resulting in a 30% increase in user engagement.
- Utilized a combination of neural networks for generating high-quality deep-fake videos based on driver video and audio inputs.
- Technologies: Deep Learning, Real-time Video Processing, Facial Animation, Python, TensorFlow.
Research Experience
Research Assistant
CNU AI & A Lab | Sep 2020 - Fev 2023
- Uzbek Sign Language Detection System:
- Developed a real-time system using Mediapipe and OpenCV to recognize Uzbek sign language with 98% accuracy.
- Translated hand gestures into text, providing an effective communication tool for the hearing impaired.
- Technologies: Sign Language Recognition, Computer Vision, Mediapipe, OpenCV, Python.
- Project Implementation
- License Plate Detection System:
- Implemented a high-precision license plate detection system using the YOLOv7 model and CCPD dataset.
- Achieved robust performance in diverse environments, enhancing automated vehicle monitoring and access control systems.
- Technologies: Object Detection, YOLOv7, Image Processing, Python.
- Early Lung Cancer Detection Model:
- Built and optimized a classification and segmentation model to detect early-stage lung cancer, improving diagnostic accuracy by 20%.
- Technologies: Medical Imaging, Machine Learning, Image Segmentation, Python, TensorFlow. Please visit https://github.com/Mrkomiljon to see more implementations of different ML models.
Education
Institution | Degree | Duration |
---|---|---|
Chonnam National University | MSc in Computer Engineering; advised by Prof. Chang Gyoon Lim; GPA: 3.63/4.5 | Sep 2019 - Feb 2023 |
Tashkent University of Information Technologies | BSc in Computer Engineering; GPA(%): 85/100 | Sep 2014 - Jun 2018 |
Publications
“An efficient stacking ensemble learning method for customer churn prediction”, (2023)
Languages
- English: Full Professional Proficiency
- Korean: Limited Working Proficiency
- Uzbek: Native Proficiency
- Russian: Limited Working Proficiency
Last Updated: 2025-03-21