I am a machine learning researcher and founder passionate about advancing generative AI, synthetic data generation, and ethical AI systems. As Co-Founder of tabularis.ai, we working on AI solutions to:
- Build highly realistic synthetic data for training efficient AI models, low-resource or large scalle
- Specilized AI models (e.g., 23-language Multilingual Sentiment Analysis Model that has 500,000+ monthly downloads)
- Tabular data, Safety AI, DPO/GRPO, AI agents
I hold a PhD in Machine Learning & Computer Science from the University of Tuebingen, where my research focused on explainability, inference, and generative models for tabular and textual data. My interdisciplinary work spans:
- Generative AI / Large Languarge Models
- Synthetic Data (Published a package that is used by Google (Kaggle), AWS, and many more, it is in top 10% of all pip packages)
I’m actively seeking collaborators, interns, and thesis students passionate about pushing boundaries in LLMs, synthetic data, and tabular machine learning.
🎓 Internships & Thesis Projects
Work on cutting-edge problems like:
- Synthetic Data Engineering: Build tools for synthetic generation/ build the best synthetic datasets.
- Specialized AI models: Currently we are looking for specilized LLMs and embedding models.
📬 Reach out via email or connect on LinkedIn.
News
- [05/06/2026] We published tuetoken, a fast tokenizer backend for LLMs, up to 30× faster than tiktoken or Hugging Face tokenizers.
- [06/02/2026] We released Faust-1, a 1.6B-parameter German language model trained from scratch, achieving competitive performance while remaining efficient enough to run on consumer hardware.
- [02/01/2026] Our paper “Do Chatbot LLMs Talk Too Much? The YapBench Benchmark” was published on arXiv, introducing a benchmark for measuring verbosity and over-generation in chatbot LLMs. arXiv
- [01/08/2024] Co-organize a NeurIPs 2024 workshop on tabular data learning. For more details, pleae visit: TBL workshop website
- [14/06/2024] Our new paper on large scalle synthetic data generation using open-source LLMs is accepted to the Data-centric Machine Learning Research workshop at ICML 2024. arxiv
📅 Book a Free Consultation
Ready to elevate your project with AI/ML expertise? Schedule a 30-minute consultation to:
- Discuss your project’s vision, objectives, and challenges.
- Explore high-level deep-learning opportunities and roadmaps.
- Determine if my services align with your needs.
Please note that this session provides an overview, not in-depth technical advice or solutions. If we decide to work together, we can create a tailored plan that addresses your project’s unique challenges and goals. To schedule a consultation or for any inquiries, please email me at vadim@tabularis.ai 📩.
