How We Evaluate New Projects Before Starting Them

The most expensive line of code is the first one you write without criteria. Everything after it inherits that choice. At Lofi Studios, we evaluate new projects before starting them with the same seriousness we apply to live ops incidents: calm language, explicit assumptions, and a willingness to say no early.

This essay is a first-hand breakdown of our pre-production framework. It is not a guarantee we never make mistakes. It is an explanation of how we reduce predictable mistakes: starting too big, staffing too thin, confusing a spike for depth, or mistaking excitement for a plan.

If you want the philosophical sibling essay about endings, read what makes a game worth keeping vs killing. If you want portfolio context for why gates matter, read how we think about building multiple games at once.

The intent-led question: what player habit are we trying to create?

Habits beat features

Features are easy to list. Habits are what players actually do on day three. We start by naming the habit: return for social friction, return for mastery, return because the world keeps generating new problems, return because the economy keeps moving.

If we cannot describe the habit, we do not greenlight production. We stay in prototype or paper.

The first session versus the tenth session

The first session is marketing. The tenth session is truth. We ask what changes between them. If the tenth session is identical but slower, we are skeptical. Repetition without evolution is a warning sign we have seen across many Roblox titles, including our own contract-era experiments summarized in what shipping three games in three months teaches you.

Structural legibility: can the team explain the loop without fog?

Two-minute clarity test

Can a designer and an engineer explain the core loop to each other in two minutes and agree? If they disagree, you do not have a loop. You have a mood board.

Systems boundaries

We map what systems must exist for the fantasy to function: economy, progression, conflict, social incentives, onboarding. Missing boundaries become missing accountability later.

This connects to why systems matter more than content. Content can ship fast. Systems decide whether the game survives contact with optimization.

Retention risk: what kills the game after novelty?

Convergence tests

We ask what players will optimize toward and whether that optimization is interesting. If the dominant strategy is boring, the game will become boring at scale.

Scarcity and stakes

Some games need scarcity to matter. Some need stakes. Some need neither, but then they need a different reason to return. We align tools with intent rather than defaulting to generic reward schedules.

Read why progression systems fail without risk for the broader design argument. Pre-production is where you decide whether your title actually wants risk, or only wants the aesthetic of risk.

Staffing truth: who owns this if everything breaks?

Named owners before hype

We assign explicit ownership for live health, creative direction, and engineering architecture before we celebrate a greenlight internally. Names on paper reveal gaps.

Bus factor and specialist scarcity

If one person holds the critical path for networking, economy, or combat feel, we address that early or we shrink scope. Hope is not staffing.

Scope sizing: the smallest version that still tells the truth

Vertical slice discipline

We prefer small slices that behave like the real game over large vertical facades that behave like a trailer. The slice should include failure modes, not only happy paths.

Kill criteria written in advance

We define what failure looks like in observable terms: retention thresholds, convergence speed, exploit severity, team velocity collapse. Fat to Fit test notes reflect the same instinct: stop when behavior tells you the structure is lying.

Portfolio fit: why this project, why now?

Opportunity cost is real

Lofi Studios is expanding beyond a single title means new work competes with existing players and existing commitments. We ask what this project unlocks that we cannot unlock by improving a current title.

Correlated risk

If two projects depend on the same scarce skill in the same quarter, we are not planning two projects. We are planning a traffic jam.

Roblox-specific evaluation lenses

Discovery sensitivity

Some ideas require constant acquisition to function. We treat that as a risk flag unless the loop earns organic return. Why Roblox games spike and die so quickly is the blunt external framing; internally we ask whether we are building for a stable habit or for a wave.

Economy and monetization ethics

We evaluate how monetization interacts with fairness and long-term meaning. The hidden cost of free-to-play on Roblox informs how we think about incentives before they harden into systems.

Performance and scale assumptions

Roblox scale is not theoretical. We ask what breaks at concurrency, not only what works in a test server.

Communication tests: can we explain this to players honestly?

Roadmap realism

If our public explanation requires lying by omission, the plan is not ready. Players punish vagueness after the first broken promise.

Community load

Some concepts attract high moderation load. We estimate support and safety costs as part of greenlighting, not as a surprise after launch.

Case patterns from our own history

Contract era: speed teaches

We learned how fast players optimize and how fast loops flatten when structure does not force tradeoffs. Why speed kills most contract-built games is part of our institutional memory.

Stewardship era: long horizons

We acquired Northwind changed how we think about commitments. Greenlighting now includes a question about whether we want to steward a world for years, not months.

Rebuild era: honesty about foundations

Why we are rebuilding Northern Frontier from scratch is a reminder that evaluation does not end at launch. Sometimes the honest evaluation is "the foundation must reset."

The decision outputs: greenlight, prototype, pause, or kill

Greenlight

Greenlight means staffing, scope, and success metrics are coherent. It is not eternal commitment. It is permission to build seriously.

Prototype

Prototype means we are buying information cheaply. Prototypes must have deadlines and kill criteria.

Pause

Pause means "not now" with a reason: staffing, platform uncertainty, portfolio sequencing.

Kill

Kill means we stop pretending. Kills are better than slow abandonment.

Security, abuse, and "can we operate this responsibly?"

Exploit surface and duplication glitches

We ask what the first wave of exploiters will try. If the answer is "everything," we narrow scope until we can secure a core. Security is not a post-launch garnish.

Social systems and harassment vectors

Player-driven games can be brilliant and dangerous. We evaluate whether chat, trading, and faction systems create predictable harm patterns. If yes, we need a moderation plan before scale, not after outrage.

Financial and sustainability realism

Runway and opportunity cost

Even when specifics stay internal, we ask whether the project can survive a cold discovery month. If the plan requires permanent heat, we treat that as structural risk.

Monetization alignment

We evaluate monetization as part of the loop, not as a sticker added at the end. Why most Roblox monetization strategies fail long-term informs the questions we ask before systems harden.

Technical feasibility: engine realities

Networking and physics constraints

Some fantasies break when latency exists. Pre-production is where we reconcile fantasy with Roblox constraints instead of discovering the mismatch in public.

Tooling and live ops workflows

We ask whether we can ship patches safely, roll back, and monitor health. If tooling is immature, we invest early or shrink scope.

Creative alignment: one vision, not three

Director clarity

Conflicting creative visions do not resolve themselves under crunch. Evaluation includes making sure leadership agrees on the player fantasy and the ethical boundaries.

Art and audio scope discipline

Visual ambition can quietly expand scope. We align art targets with performance budgets and staffing.

What we document before we start

Assumptions log

We write assumptions down: audience, session length, competitor set, platform roadmap sensitivity. Assumptions should be revisitable, not tribal memory.

Risk register

We list top risks and mitigations. If mitigations are missing, we are not ready.

Success metrics that players would recognize

We tie internal metrics to player-visible health: fairness, variety, stability, respect for time. Vanity dashboards are seductive and dangerous.

Competitive clarity: what are we actually competing with?

Substitutes, not only genre labels

Players do not care about our internal genre tags. They care about what else they could play in the next thirty minutes. We name substitutes honestly: other Roblox experiences, other platforms, and offline time.

Differentiation that survives contact

If differentiation is only cosmetic, players will notice. If differentiation is systemic, it shows up in behavior. What most games get wrong is a useful anchor: players quit when the system stops rewarding attention.

Post-greenlight discipline

Re-evaluate at milestones

Greenlight is not immunity. We schedule checkpoints after first public tests, first monetization touchpoints, and first concurrency spikes. A plan that made sense on paper can fail reality quickly.

Keep kill criteria alive

Teams should not forget the conditions that would stop the project. Memory drifts. Written criteria stay.

Evaluation does not end when excitement peaks. It continues until the game either earns a long horizon or earns an honest stop. That continuity is what keeps a studio trustworthy over years, not weeks, especially on Roblox.

Frequently asked questions

Do you use a formal scoring rubric?

We use structured questions and written assumptions more than a single numeric score. Numbers can fake precision. The goal is shared clarity.

How long does evaluation take?

It depends on risk. Low-risk prototypes can move fast. High-risk commitments take longer because the cost of error is larger.

Who makes the final call?

Leadership owns decisions, but evaluation should surface disagreements early. If only one person sees the risk, the process failed.

Can community hype influence greenlights?

We listen to community desire, but we do not treat hype as evidence of structural health. Hype is volatile. Systems are what you ship.

Thanks for reading, and for playing with us on Roblox.