Data Engineer. Analytics Consultant. AI/ML Developer

Allen Ben Philipose

About

Learn more about me

Summary

A Data Engineer and Analyst with 2+ years of experience in technical consulting, specialising in the design and deployment of cloud-native ETL and ML pipelines processing terabyte-scale datasets to deliver stakeholder-ready insights and drive high-impact business decisions.

Master of Computer Science Data Science & AI
0 USYD WAM
0 Analysts Trained
0 AWS Cost Reduction
0 GCP Skill Badges
Profile

Sydney, NSW, Australia

Master of Computer Science

Data Science and Artificial Intelligence

University of Sydney, New South Wales, Australia

  • Jul 2024 - Jun 2026
  • WAM: 85.1/100, High Distinction
  • Sydney International Student Award Recipient
  • Peer Mentoring Advisor supporting incoming engineering students
  • Volunteer at the University of Sydney Union
  • Professional affiliations: GDG Sydney, Engineers Australia, Sydney Computing Society, 180 Degrees Consulting
Bachelor of Technology

Electronics and Communication Engineering

Vellore Institute of Technology, Tamil Nadu, India

  • Jul 2018 - Jun 2022
  • CGPA: 8.87/10
  • Specialised in Internet of Things and Sensors
  • Board Member at IoThinC, leading technical mentorship and architectural guidance
  • Founded and organised IoThon, the university's flagship IoT hackathon
  • Professional affiliations: IEEE Robotics and Automation Society, IEEE Communications Society
Higher Secondary

12th Grade

Delhi Public School, Bangalore East, KA

  • Jul 2016 - Jun 2018
  • English - Physics - Chemistry - Mathematics - Computer Science
  • Rock Band Bass Guitarist (2017)
  • 82.8%

10th Grade

Baldwin Boys' High School, Bangalore, KA

  • Jul 2012 - Jun 2016
  • English - Science - Mathematics - Malayalam - Computer Science
  • Vice Band Major (2015)
  • Rock Band Bass Guitarist (2014-2016)
  • Rock Band Pianist (2013)
  • Senior Library Monitor (2014)
  • 90.6%
Experience

My Roles

Data and Analytics Intern

L.E.K. Consulting - Sydney, New South Wales, Australia

  • Engineered end-to-end NLP Python pipelines to analyse sentiment and market behaviour.
  • Integrated geospatial indexing and catchment-based modelling to deliver geo-specific insights across multiple industry verticals.
  • Developed reusable, modular Alteryx workflows for spatial blending, predictive modelling, and analytical reporting.

Decision Analytics Associate Consultant

ZS Associates - Bangalore, Karnataka, India

  • Promoted to Associate Consultant in Jun 2024, in recognition of consistent delivery and technical leadership.
  • Architected a secure, client-specific AWS environment with EC2, EMR, S3, and RDS layers.
  • Implemented cost-monitoring guardrails that drove a 20% reduction in monthly cloud expenditure.
  • Delivered Python and SQL upskilling workshops to 100+ onboarding analysts.
  • Designed and deployed automated batch ETL pipelines for weekly sales actions and brand performance insights.
  • Built scalable PySpark pipeline frameworks for multi-brand, multi-market expansion across client portfolios generating $5B+ annually.

Decision Analytics Intern

ZS Associates - Bangalore, Karnataka, India

  • Developed analytical solutions to quantify product impact and uncover white-space opportunities.
  • Applied Real World Data methodologies in the pharmaceutical domain.
  • Built proficiency in patient-level data handling, anonymisation protocols, and regulatory compliance frameworks.

Head of Design

IoThinC - VIT - Vellore, Tamil Nadu, India

  • Structured digital learning pathways and facilitated fortnightly upskilling sessions for a 30-member cross-functional team.
  • Enabled skill development and bridged the gap between design and technology domains.

Technical Lead UI/UX

Skillship Foundation - Vellore, Tamil Nadu, India

  • Mentored three student software development teams on web frameworks and full-cycle project delivery.
  • Led UX wireframing efforts to redesign the organisation's official website.
  • Established a unified social media visual identity and formalised PR team guidelines.
  • Drove a 70% increase in event attendance and strengthened student engagement.

Data Analytics Intern

Sparks Foundation

  • Built and fine-tuned learning models using Python
  • Co-ordinated with peers to improve overall performance
  • Analyzed Business data to design visualization dashboards
Skills

Tech Stack

Expertise across the end-to-end lifecycle of data products, from infrastructure provisioning and robust ETL orchestration to ML and LLM frameworks for stakeholder-ready insights.

Data Science and ML

TensorFlow, PyTorch, Scikit-learn, NLP, YOLO, Transformers, OpenCV, Pandas, NumPy, and RAG.

Data Engineering

Apache Spark, PySpark, Apache Airflow, Hadoop, Alteryx, Redshift, DBT, and ETL/ELT pipelines.

Data Analytics and BI

MySQL, PostgreSQL, Snowflake, Tableau, Databricks, Google Analytics, Power BI, and Excel.

Cloud and Infrastructure

Amazon Web Services, Google Cloud Platform, Firebase, and CI/CD pipelines.

Software Development

Python, SQL, C/C++, BASH, Java with Spring and Hibernate, GraphQL, REST APIs, Git, and YAML.

Design and Collaboration

Jira, Figma, Postman, Adobe Creative Cloud, Dialogflow, Microsoft 365, and LaTeX.

Projects

My Work

Resume projects are highlighted first. Earlier academic and exploratory projects stay available below as compact dropdowns.

Python. TensorFlow. OpenCV. YOLO.

Emergency Vehicle Detection and Tracking

Developed a cost-effective, real-time system that processes live surveillance feeds using TensorFlow to detect emergency vehicles and dynamically clear their paths. Designed for scalability across diverse environmental conditions and optimised image recognition latency to support faster response times.

Python. X3D CNN. Agentic AI. Vector DB.

LLM Surveillance Video Querying

Architected a multi-modal AI security platform enabling operators to query large-scale surveillance footage using natural language inputs. Leveraged transformer-based vision-language models to surface time-indexed events and objects, eliminating manual review of high-volume video archives and reducing investigation time.

Python. SpaCy. NLTK. Plotly.

Sentiment Analysis on Racism

Conducted a large-scale NLP study analysing public sentiment across U.S. geographic regions using web-scraped Twitter data collected following the George Floyd case. Applied tokenisation, named entity recognition, and sentiment scoring to identify statistically meaningful regional patterns and shifts in public opinion over time.

Python. CNN. GAF. PyFeat.

Deep Learning Emotion Recognition

Engineered a novel framework that converts 1D physiological time-series signals into 2D Gramian Angular Field image representations, enabling CNNs to classify emotional states and stress levels with high accuracy. Bridges biomedical signal processing and image-based deep learning, with applications in mental health monitoring.

2021 Crowd Management

This project is about analyzing crowd movement, as well as the vehicles present in traffic using big data analytics, and trying to optimize the human movement while also sending a text message to indicate congestion and rerouting to the next shortest distance to reach the user's destination

2021 Soil Moisture Prediction

A deep learning regression network with big data fitting capability was proposed to construct a soil moisture prediction model. By integrating the dataset and analyzing the time series of the predictive variables, the selected meteorological parameters can provide effective weights for moisture prediction

2020 Gesture Based Doorbell

Arduino based project developed as an initiative to reduce the usage of touch surfaces amid the COVID scare. The project relies on object proximity detection using HC05 ultrasonic sensor, the output of which, the system uses to control the door movements and alert system. The data is stored in cloud via ESP8266

2020 Canny Edge Detection

A detailed elaboration of how Canny Edge Detection of openCV library works. The project contains four sections - basic image processing operations applied on a sample image, the inner workings of canny edge detection, openCV function on live camera feed and its application on a video with tweaker parameters

2020 Travel Log comparison

The project relies on the usage of NLTK for the analysis of articles based on pre-determined parameters for checking the quality of the piece. URL links of the articles are given as input to the program which will then scrape, tokenize and pass through all conditions to evaluate and rank the different writers

2020 Trading Chatbot

Making use of the chatterbot library by Python to make a chatbot that will not only interact naturally with the user but also help to trade stocks via Yahoo Finance. The chatbot receives the data through web scraping, historical data through a cloud data server and can execute the functions based on the input

2020 Surveillance System with Person ReID

A surveillance analytics project built around identifying and matching people across camera frames. The work explored computer vision, person re-identification, and matching logic for security-oriented video review.

2019 Digital Lock

Verilog HDL can be explained as a low-level programming language used for hardware integrations. Here in this project, we take an Altera DE-115 and program it to simulate a digital lock. The software is developed purely using codes of logical and bitwise operations with a manual clock frequency

2019 Autocorrect with Tries

Tries, also called a digital tree or a prefix tree, is a data structure, used here for the prediction of words while typing a sentence. This project takes it an extra step ahead by allocating a spot for it's own dictionary and autocorrecting the input words according to its contents which ensures higher accuracy