How We Rebuilt a Research SaaS Used by 200,000+ Academics
Paperpile was an existing research platform built on PHP that could not scale to their growth trajectory. We redesigned the full architecture in Figma, rewrote the backend in Node.js, migrated the database to MongoDB, and rebuilt the frontend in React — while maintaining 100% feature parity. The result: 200,000+ active users, 95% performance scores, and 3× faster than the original.

TL;DR
Paperpile had an existing PHP research platform with a real user base. The architecture could not scale to where they needed to go. We redesigned the full UI system in Figma, rewrote the backend from PHP to Node.js, migrated to MongoDB, and rebuilt the frontend in React, all while keeping the live product running. Eighteen months later: 200,000+ active users, 95% performance scores, 3× speed improvement, 100% feature parity maintained throughout.
The Situation
Paperpile was already a functional product with an active user base of researchers and academics. The platform handled PDF management, citation generation, collaborative editing, and browser extension integration, a technically complex set of operations built on PHP over several years.
The problem: the existing architecture could not support the user growth they were targeting. Adding features was getting slower and more expensive. Performance was degrading as the user base grew. A complete architectural rewrite was necessary, but it had to happen without breaking a product that real users depended on daily.
The Challenges We Solved
Challenge 1: Design a Component System That Works Across Web, Mobile, and Extensions
The PHP interface was functional but inconsistent. Different pages had different interaction patterns, the mobile experience was poor, and there was no shared design language between the web app, mobile app, and browser extension.
What we built in Figma:
- A complete design system with shared component library (buttons, forms, cards, modals, navigation patterns)
- Responsive design specifications for every component across breakpoints
- Interaction states documented for every interactive element
- Accessibility annotations for keyboard navigation and screen reader compatibility
- Design documentation that the engineering team could implement without ambiguity
The Figma design system became the single source of truth for all visual decisions throughout the 18-month build, ensuring consistency across every platform.
Challenge 2: Architect a Backend That Can Scale to Hundreds of Thousands of Users
The PHP backend was a monolithic architecture that did not separate concerns cleanly. Adding a new feature often required touching multiple unrelated parts of the system. Scaling one component meant scaling everything.
Backend architecture we built (Node.js + MongoDB):
- RESTful API design with clearly defined service boundaries
- Microservices structure allowing independent scaling of high-traffic features (PDF processing, citation lookup, collaboration sync)
- MongoDB schema designed for research document storage, flexible enough for varied citation formats and performant at the query patterns the product required
- Multi-layer caching strategy reducing database load for frequently accessed data
- Security architecture protecting research data with encryption at rest and in transit
Frontend architecture we built (React):
- Component-based structure with shared component library matching the Figma design system
- State management for complex real-time collaboration features
- Client-side routing for smooth navigation without full page reloads
- Code splitting to reduce initial bundle size
- Progressive enhancement ensuring functionality across browser versions
Challenge 3: Migrate Years of Business Logic Without Losing Functionality
Years of PHP code contained product logic that was not fully documented. A direct rewrite without careful mapping risked introducing regressions that would break workflows researchers depended on.
Our migration approach:
- Feature audit: documented every function in the existing PHP codebase with expected input/output behavior
- Test suite built against the existing system to define “correct behavior” before touching any code
- Feature-by-feature rewrite in Node.js, validated against the test suite at each step
- Parallel running: new components operated alongside old ones until validated
- Staged cutover: users migrated in cohorts, with rollback capability at each stage
The result: 100% feature parity on launch day. No user-facing regressions throughout the 18-month project. Zero forced downtime.
Challenge 4: Performance Engineering for a Data-Heavy Research Application
Paperpile users work with thousands of PDFs and complex citation databases. The PHP version was slow, with sub-par load times for large libraries and noticeable lag in collaborative editing sessions.
Backend performance work:
- MongoDB indexing optimized for the specific query patterns of research document retrieval
- Async processing for PDF analysis and citation extraction, so heavy operations run in the background without blocking the user interface
- CDN integration for static assets, with files served from edge locations closest to the user
- Load balancing configuration for horizontal scaling as user count grows
Frontend performance work:
- Code splitting: JavaScript bundles load only what each page needs
- Lazy loading for images, PDFs, and off-screen components
- Bundle optimization reducing total JavaScript payload by 60%
- Core Web Vitals monitoring integrated into the deployment pipeline
Outcome: 95% performance scores consistently, sub-second search across millions of documents, collaborative editing without refresh lag, 3× faster than the original PHP version on equivalent operations.
Results
| Metric | PHP Version | MERN Version | Change |
|---|---|---|---|
| Active users | Baseline | 200,000+ | Significant growth |
| Performance score | Below threshold | 95% | Industry-leading |
| Search speed | Slow | Sub-second | 3× improvement |
| Feature additions | Slow, expensive | Fast, modular | Architectural benefit |
| Platform support | Web only | Web + Mobile + Extensions | Expanded |
What Made the Difference
Feature-by-feature migration, not a big-bang rewrite. The most common failure mode in platform rewrites is attempting to rewrite everything simultaneously and launch all at once. Every component that can be isolated and validated independently reduces the overall risk. We isolated 23 distinct feature areas and validated each one before proceeding.
Performance built into the architecture, not bolted on afterward. Performance optimizations added after architecture decisions have limited effect. The ceiling is set by the architecture itself. Building caching, async processing, and database indexing into the design phase rather than the optimization phase meant there was no artificial ceiling to work around later.
Design system first. Building the Figma component library before writing any code meant that engineering decisions could reference a shared visual language from day one. Inconsistencies that appear in design are easy to fix. Inconsistencies discovered during engineering review are expensive.
Stop Losing Leads
Your Ads Already Paid For.
Book a free 30-minute audit. We map your current lead flow, calculate your exact revenue leakage, and show you the precise AI configuration for your agency, at no cost, no obligation.