ShadowSpeak
A full-stack ESL learning platform where students practice pronunciation by recording themselves mimicking YouTube video segments, and teachers review submissions and provide feedback. Students receive AI-scored pronunciation evaluations powered by Azure Cognitive Services Speech API.
Live VersionRepo
Project Goal
Build a platform that replaces the scattered workflow ESL teachers use today: Google Drive for storage, email for submissions, and screen recording software for practice. Students can loop specific YouTube segments, record themselves in-browser, and submit directly. Teachers create lessons, assign them to students, and review submissions with threaded feedback. Azure Speech API automatically scores each recording for pronunciation accuracy.
Tech Stack
- Next.js 15 with App Router, React 19, TypeScript
- Express.js backend with PostgreSQL database (8 tables)
- Azure Blob Storage for audio recordings and images
- Azure Cognitive Services Speech SDK for pronunciation scoring
- Cloudinary for teacher video uploads and audio extraction
- JWT authentication with role-based access control
- SWR for data fetching with automatic revalidation
- Material UI components and WaveSurfer.js waveform visualization
- next-intl for multi-language support
- Resend for email notifications, Sentry for error monitoring
- Husky + lint-staged for pre-commit code quality enforcement
Solo Project
This is my most comprehensive project. I built everything from scratch:
- Azure Speech API integration — evaluates recorded audio and returns per-word scores for accuracy, fluency, completeness, and pronunciation
- Browser audio recording using MediaRecorder API with start, pause, resume, and stop controls
- YouTube segment looping via custom 100ms polling (YouTube API does not support native looping)
- Cloud storage pipeline: recordings converted to base64, uploaded to Azure Blob Storage, URLs persisted to database
- Phrase practice pipeline — audio segments with start/end timestamps linked to lessons so students can practice individual phrases
- Threaded feedback system — teachers and students exchange replies per submission stored in a feedback_replies table
- Student progress tracking — status machine: new → in_progress → submitted → completed with collapsible lesson archive
- JWT auth with RBAC: teachers and students have different permissions enforced on both frontend and backend
- Email notifications via Resend API alert teachers when students submit recordings
- PostgreSQL schema with 8 tables (users, lessons, assignments, audio_segments, practice_words, practice_results, lists, feedback_replies), proper foreign keys and cascade deletes
Solo Project
Technical Challenges Solved
YouTube Segment Looping
The YouTube embedded player does not support native segment looping. I built a custom solution using a useLoopButtons hook that polls the video position every 100ms. When playback reaches the end timestamp, it automatically seeks back to the start. The state machine manages transitions: idle, start_set, ready, looping.
Browser Audio Recording Pipeline
Recording audio in the browser and uploading to cloud storage required multiple integration steps: capture audio stream with MediaRecorder API, convert Blob to base64, send to Express backend, convert to Buffer, stream to Azure Blob Storage, return URL, and persist to PostgreSQL. All managed through a reducer pattern in RecorderPanelContext.
Role-Based Access Control
Implemented two-tier access: JWT tokens contain role claims ("teacher" or "student"), middleware validates every API request, and server-side route protection checks req.user.role !== "teacher" before allowing destructive operations. Students get 403 Forbidden if they try to access teacher endpoints.
What I Learned
- How to integrate AI/speech recognition APIs (Azure Cognitive Services) into a production full-stack application
- How to build a complete full-stack application from database schema design to deployed production frontend
- How to work with browser APIs (MediaRecorder) and multiple cloud services (Azure Blob, Azure Speech, Cloudinary)
- How to implement proper authentication and authorization with JWT and role-based access control
- How to manage complex UI state using reducer patterns and context
- How to design relational database schemas with proper foreign keys, constraints, and indexes for query performance
- How to set up production monitoring (Sentry), transactional email (Resend), and pre-commit code quality enforcement (Husky)