Hi, I'm Kanishk Kapoor
MSc Computing (Data Analytics) at Dublin City University · Building production LLM pipelines, agentic AI systems, and real-time data infrastructure. Previously at IBM & Medicidiom.
Who I Am
A passionate technologist bridging the gap between cutting-edge AI research and production-grade engineering.
I'm an AI Developer & Data Engineer currently pursuing my MSc Computing (Data Analytics) at Dublin City University, expecting a 1:1 (First Class Honours).
I specialise in building LLM-powered agentic systems, production ML pipelines, and real-time data infrastructure using tools like OpenAI API, LangChain, Apache Kafka, Azure, and AWS. I've shipped 30+ projects spanning FinTech, healthcare, energy, and logistics.
Currently interning at Medicidiom (Spain, Remote) where I build production AI automation workflows and document-intelligence pipelines that process 1,000+ documents with 25% accuracy gains.
Interests
MSc Computing (Data Analytics)
Dublin City University · 1:1 Expected · 2025–Present
B.Tech Computer Science
UPES · CGPA 8.7/10 · 2020–2024
AI Automation Intern
Medicidiom, Spain (Remote) · Feb 2026–Present
Based in Dublin, Ireland
Open to remote & on-site opportunities
Quick Facts
Impact by the Numbers
Real results from real production systems
Projects Shipped
Spanning AI, Data Engineering, ML & Full Stack
Records Processed
Across ML pipelines and data engineering projects
Manual Effort Reduced
At Medicidiom via AI automation workflows
Pipeline Uptime
Production data pipelines at Medicidiom
Accuracy Improvement
ML models at IBM for threat detection
Documents Processed
Via LLM-powered intelligence pipelines
My Toolkit
A comprehensive stack spanning AI, data engineering, cloud, and full-stack development.
AI & LLMs
ML & Deep Learning
Languages
Data & Pipelines
Cloud & DevOps
Visualisation & BI
Core Language Proficiency
My Journey
From classrooms to production systems at IBM and beyond.
Work Experience
AI Automation & Operations Intern
LIVE- ▸Architected LLM-powered document-intelligence pipelines (Python + OpenAI API) processing 1,000+ documents — improving data accuracy ~25% and cutting manual review by 35%
- ▸Built agentic AI automation workflows eliminating ~45% of manual effort and reducing analytics turnaround by 30%
- ▸Production pipelines maintained 99%+ uptime with ~20% latency reduction
- ▸Created Power BI dashboards surfacing live operational KPIs, reducing ad-hoc reporting requests by ~40%
Cybersecurity & Data Analysis Intern
- ▸Applied ML classification models (Python, Scikit-learn) to millions of security records — improved detection accuracy by 22% and reduced false positives by 15%
- ▸Built and evaluated multiple model architectures on multi-year datasets
- ▸Improved outbreak forecasting accuracy by 18% through systematic experimentation
- ▸Delivered analytical findings to senior analysts to directly inform remediation decisions
Education
M.Sc. Computing (Data Analytics)
Dublin City University
B.Tech Computer Science
University of Petroleum & Energy Studies
Certifications
Google Data Analytics Professional Certificate
2024
Forecasting in Business — Deakin University
2024
Data Analytics for Investment
2024
What I've Built
30+ projects spanning AI, data engineering, machine learning, and full-stack development.
Product Analytics MCP / LLM Agent
Agentic AI system that lets users query product analytics in plain English. Eliminates manual SQL or BI tool access entirely via a Model Context Protocol (MCP) server with a natural language interface.
AI / LLMNews Intelligence Dashboard
Real-time data pipeline: news API → dual-model summarisation (OpenAI GPT-4 + HuggingFace DistilBART fallback) → Streamlit dashboard with smart API rate-limit handling.
AI / LLMProject Aeroflow — Real-Time Pipeline
End-to-end real-time airline delay data pipeline: FastAPI producer → Apache Kafka → Azure Event Hubs → Databricks PySpark (Bronze/Silver/Gold) → Snowflake → live Snowsight KPI dashboards.
Data EngineeringE-Commerce Sales Pipeline
Real-time order streaming pipeline: FastAPI event source → Kafka → Spark stream processing → structured JSON in AWS S3 with Airflow orchestration and Docker containerisation.
Data EngineeringEuropean Water Quality ML Model
Research-grade ML pipeline on 5M+ European environmental records. Spatio-temporal feature engineering, gradient boosting for nitrate/phosphate pollution risk prediction across 4 water body types.
Machine LearningTransaction Fraud Detection
FinTech fraud detection on 18K+ transactions with feature engineering, statistical validation (ANOVA, Mann-Whitney U), XGBoost with deliberate class-imbalance handling.
Machine LearningLet's Connect
Open to internships, graduate roles, and exciting project collaborations. Let's build something great together!