Video Shorts Converter Web App
As a developer and content creator, I noticed how time-consuming it is to repurpose long-form videos for social media. This project was born out of a desire to automate and simplify that process for creators, marketers, and influencers.
The Problem / Opportunity
Long-form videos (like YouTube content, webinars, or podcasts) are packed with valuable moments, but manually clipping, reframing, and subtitling them for platforms like TikTok or Instagram Reels is tedious and repetitive. There was a clear need for a tool that could automate highlight detection, aspect ratio conversion, and export for multiple platforms.
Key Features / Solution
- Upload & Import: Upload videos (MP4, MOV, AVI) or import from YouTube/Vimeo (up to 2GB).
- AI-Powered Clip Suggestions: Auto-detect highlights using speech transcription (Whisper), scene changes, facial recognition, and keyword detection.
- Automatic Shorts Generation Pipeline:
- Transcribe video with Whisper (word-level timestamps).
- Use Gemini LLM to select engaging segments.
- Extract clips with FFmpeg.
- Smart crop with OpenCV to keep speakers in frame and reframe for 9:16, 1:1, or 16:9.
- Add auto-generated, editable subtitles and overlays.
- Export for preview, download, or further editing.
If I were to build this as a SaaS, I would add the following features:
- Manual Editing Tools: Timeline editor, aspect ratio switch, smart cropping, add/edit subtitles, overlays, logos, and animations.
- Optimized Export: Batch export, download, or direct share to TikTok, Instagram Reels, YouTube Shorts.
- User Accounts & Projects: Dashboard for managing videos, edits, and exports.
- Billing: Free and Pro plans with Stripe integration.
Technical Stack / Development
- Frontend: Next.js 15, TailwindCSS, shadcn/ui
- Backend: Fastify or Next.js API routes
- Video Engine: FFmpeg (WebAssembly/server-side), Remotion
- AI Tools: Whisper (transcription), OpenCV (face detection), Gemini LLM
- Database & Auth: Supabase (PostgreSQL, Auth, Storage)
- Queue Jobs: BullMQ + Redis
- Hosting: Vercel (frontend), Render/Fly.io (backend)
How It Works / User Experience
- Login/Register
- Upload Video or Paste URL
- Processing:
- Transcription, scene detection, AI clip suggestions
- Preview & Edit Shorts:
- Choose aspect ratio, trim/edit, add subtitles/overlays
- Export:
- Download or share
The UI is designed for speed and clarity, with responsive design and accessibility in mind. Most clips are generated in under a minute for a 10-minute video.
Challenges & Solutions
- Performance: Ensuring fast processing for large files required queue-based pipelines and efficient use of FFmpeg and OpenCV.
- AI Integration: Combining Whisper, Gemini LLM, and OpenCV for a seamless pipeline was complex, especially for accurate highlight detection and smart cropping.
- Scalability: Used Redis and BullMQ to handle concurrent processing jobs and avoid bottlenecks.
Learnings & Takeaways
- Integrating multiple AI and video processing tools is challenging but rewarding for user experience.
- Real-time feedback and previews are crucial for user trust.
- Accessibility and responsive design are non-negotiable for modern web apps.
Future Improvements
- Speaker auto-switching (Zoom-style framing)
- Custom AI prompts for video summarization
- AI voiceovers/narration
- Social media scheduling and analytics
- Direct sharing integrations
Conclusion
Video Shorts Converter aims to empower creators to do more with their content, faster. By automating the tedious parts of video editing, it lets users focus on creativity and storytelling.