GitHub 🐱‍💻 | LinkedIn 🔗

About Me

I am a dedicated AI Developer with 2+ years of industry and 3+ years of academic experience in machine learning, deep learning, NLP, and computer vision. My expertise includes speech-to-text (STT), large language models (LLMs), DeepFake detection, and AI-generated content classification. I have successfully built and deployed real-time AI models for voice authentication, phishing detection, and AI-driven audio classification.

I specialize in model quantization (ONNX, TensorFlow Lite, TensorRT), optimization, and scalable AI solutions for mobile and server deployments. My work focuses on multilingual speech recognition, AI-based security systems, and real-time inference acceleration, ensuring efficient and high-performance AI applications.


Skills Summary

  • Programming Languages: Python, C/C++, Java
  • Database Management: MySQL, PostgreSQL, PySpark
  • ML & AI Frameworks: PyTorch, TensorFlow, HuggingFace Transformers, Scikit-learn, PyTorch Lightning
  • Speech & NLP: Whisper, KoBERT, Transformers, Speech-to-Text (STT), Large Language Models (LLMs)
  • MLOps & Optimization: Docker, Kubernetes, MLFlow, FastAPI, TorchServe, TensorRT, ONNX, TensorFlow Lite
  • Development Tools: Git/GitHub, CI/CD, Docker-compose
  • Cloud & Deployment: AWS EC2, GCP, Edge AI, Developed AI Models for Mobile Applications

Main Competencies

  • Computer Vision & Image Processing: Object Detection, Object Tracking, OCR, Image Restoration & Enhancement, Medical Imaging
  • Speech & NLP: STT, Natural Language Processing, Large Language Models (LLMs), Vision-Language Models, Generative AI
  • AI Model Development & Optimization: Model Quantization, Real-time AI Systems, DeepFake Detection, Clustering, Re-Identification
  • End-to-End AI Solutions: Building Scalable AI Pipelines, Deployment Pipelines, GCP & Cloud AI Model Deployment

Work Experience

AI Developer

Museblossome | Nov 2024 - Present

  • DeepVoice – Real-time Voice Phishing Detection
    • Achieved 98% accuracy by integrating Speech-to-Text (STT) and a fine-tuned KoBERT model for phishing detection.
    • Optimized inference speed by processing audio in chunks for efficient streaming.
    • Deployed the model on Android using ONNX and TensorFlow Lite.
    • Project implementaion
  • DeepVoiceGuard – AI vs Human Voice Classification
    • Created a model that identifies AI-generated voices in phone conversations.
    • Collected and processed ASVspoof2019 dataset of real and AI-generated voices.
    • Achieved 95.8% accuracy and deployed on huggingface and local servers via FastAPI.
    • Project implementaion
  • Fine-Tuning Whisper for Korean and Uzbek Speech Recognition
    • Fine-tuned Whisper-medium model for Uzbek speech-to-text using a fully custom dataset collected and curated specifically for the Uzbek language.
    • Achieved a Word Error Rate (WER) of 6.48% on the evaluation set, demonstrating high accuracy for real-world Uzbek speech scenarios.
    • Built a Korean speech dataset combining voice phishing, phone conversations, and AI Hub data for domain-specific STT.
    • Achieved a WER of 9.17% on Korean test data, emphasizing the model’s effectiveness in voice security contexts.
  • AI-Generated vs Real Music Classification
    • Curated a dataset of 1M+ samples across 10 classes for music classification.
    • Developed a custom AI model to distinguish real vs AI-generated music.
    • Applied quantization (ONNX, TensorFlow Lite) for high-performance mobile inference.

AI Research Engineer

Aria Studios Co. Ltd | Jun 2023 - Nov 2024

  • Real-time Live Portrait Optimization:
    • Optimized the Live_Portrait model for real-time performance using webcam and monitor setups, achieving seamless and responsive operation.
    • This project has gained significant recognition on Git-hub, receiving a high number of stars and positive feedback from the community.
    • Technologies: Real-time Image Processing, Webcam&Monitor Integration, Model Optimization, Python.
    • Project implementation
  • Image Enhancement & Deep-fake Creation for Broadcast:
    • Enhanced image quality and restored facial features to improve the realism of Deep-fake videos.
    • Produced high-quality Deep-fake videos for KBS election coverage, showcasing the potential of advanced ML techniques in media.
    • Technologies: Image Enhancement, Face Restoration, Deep-fake Generation, Python,GANs, Open-CV.
    • Project implementation
  • Multimodal User Interaction System:
    • Created an integrated system combining gaze tracking, emotion estimation, and audio-to-text conversion to enhance user interaction.
    • Enabled real-time adaptive responses for entertainment applications, significantly improving user engagement.
    • Technologies: Gaze Tracking, Emotion Estimation, Audio Processing, Python, Machine Learning.
  • Interactive Hyundai Car Models:
    • Developed a model pipeline using the IP-Adapter model to generate interactive 3D car models from grayscale images and user-provided text prompts.
    • Enabled users to design and visualize both classic and futuristic car models in real-time, preserving the target logos and aesthetics.
    • Technologies: 3D Modeling, Image-to-3D Conversion, User Interaction, Python, TensorFlow.
  • 3D Scene Creation for Interactive Films:
    • Implemented 3D Gaussian splatting to create detailed and lifelike 3D reconstructions from point cloud data.
    • Optimized 3D rendering processes to enhance the realism of environments and characters in interactive films.
    • Technologies: 3D Reconstruction, Gaussian Splatting, Point Cloud Processing, Open-CV, PyTorch.
  • Facial Performance Transfer System:
    • Developed an advanced system for AI avatars to deliver multilingual speech with highly realistic facial expressions and lip synchronization.
    • Leveraged state-of-the-art deep learning and real-time video processing techniques, resulting in a 30% increase in user engagement.
    • Utilized a combination of neural networks for generating high-quality deep-fake videos based on driver video and audio inputs.
    • Technologies: Deep Learning, Real-time Video Processing, Facial Animation, Python, TensorFlow.

Research Experience

Research Assistant

CNU AI & A Lab | Sep 2020 - Fev 2023

  • Uzbek Sign Language Detection System:
    • Developed a real-time system using Mediapipe and OpenCV to recognize Uzbek sign language with 98% accuracy.
    • Translated hand gestures into text, providing an effective communication tool for the hearing impaired.
    • Technologies: Sign Language Recognition, Computer Vision, Mediapipe, OpenCV, Python.
    • Project Implementation
  • License Plate Detection System:
    • Implemented a high-precision license plate detection system using the YOLOv7 model and CCPD dataset.
    • Achieved robust performance in diverse environments, enhancing automated vehicle monitoring and access control systems.
    • Technologies: Object Detection, YOLOv7, Image Processing, Python.
  • Early Lung Cancer Detection Model:
    • Built and optimized a classification and segmentation model to detect early-stage lung cancer, improving diagnostic accuracy by 20%.
    • Technologies: Medical Imaging, Machine Learning, Image Segmentation, Python, TensorFlow. Please visit https://github.com/Mrkomiljon to see more implementations of different ML models.

Education

Institution Degree Duration
Chonnam National University MSc in Computer Engineering; advised by Prof. Chang Gyoon Lim; GPA: 3.63/4.5 Sep 2019 - Feb 2023
Tashkent University of Information Technologies BSc in Computer Engineering; GPA(%): 85/100 Sep 2014 - Jun 2018

Publications

“An efficient stacking ensemble learning method for customer churn prediction”, (2023)

Languages

  • English: Full Professional Proficiency
  • Korean: Limited Working Proficiency
  • Uzbek: Native Proficiency
  • Russian: Limited Working Proficiency

Last Updated: 2025-03-21