The cold open: $13,284 walked out the door while I was sleeping
On May 21, 2026, I opened the Discord channel my trading bot logs to and scrolled. And scrolled. And scrolled. Eighty-eight CRITICAL alerts. Fifteen hundred ERRORs. Three crash-recovery restarts. My paper account had gone from $100,000 to $86,716 in fourteen days. The S&P 500 was up 2.3% over the same window.
The bot — my bot, the one I spent three months building — wasn't trading. It was bleeding. Quietly, automatically, every 60 seconds, with the unflappable confidence of a Vegas blackjack dealer who just doesn't realize the deck has been swapped out for nothing but face cards going to the house.
I named it ClawdBot. I built it with Claude Code. I deployed it to a Hostinger VPS like a real grown-up engineer. I gave it 24 Discord commands, 12 cron jobs, a sector-rotation module, an options module, an LLM-powered "coaching memo" feature, and a kill switch I was very proud of. What I did not give it — not even once — was a backtest.
This is the story of how that ended, and what I built instead.
Act 1: The bot that felt like progress
Rewind to March 2026. I'd just finished reading Mark Minervini's "Trade Like a Stock Market Wizard" for the second time. If you don't know Minervini: he's a U.S. Investing Champion who runs a system called SEPA — Specific Entry Point Analysis — built around Volatility Contraction Patterns. Stocks that quietly tighten up after a big move, then break out on volume. It's elegant on paper. It looks gorgeous on a chart. It is, by all accounts, a real strategy used by real traders making real money.
So I did what any reasonable software guy with too much Claude Code credit would do: I started building.
Three months later, ClawdBot v0.5 was a beast. It had:
- A scanner that watched the S&P 500 every morning for VCP setups
- An auto-apply pipeline that headlessly placed entries with stops
- An anomaly detector that watched for unusual price action
- A sector-rotation overlay so it wouldn't overload on one industry
- An options module (because of course it did)
- Twelve scheduled jobs running on systemd timers
- Twenty-four Discord slash-commands so I could babysit it from my phone at brunch
- An Anthropic-powered "coaching memo" that wrote me daily reflections in Minervini's voice, which felt very cool and was very dumb
It was — and I mean this honestly — the most sophisticated thing I'd ever built. And every commit, every new command, every nightly Discord notification felt like progress. Like I was climbing the ladder. Like I was a quant now.
Reader, I was not a quant. I was a guy with a hammer who had decided the entire stock market was a nail.
Whenever you find yourself building feature #14 before validating that features 1–3 actually work, you are not engineering. You are stalling. The bot's job is to make money. If you haven't proven it can, the rest is just LARPing as a hedge fund.
Act 2: The crash — or, how my kill switch killed me
On May 19, 2026, ClawdBot did something it shouldn't have been able to do. It bought into positions while still holding partial fills from earlier orders, while also holding stop-loss orders against the same shares. The day-trading module — which I'd told myself was off — turned out to be silently active, stacking exposure on top of overnight positions. Gross leverage quietly ticked up to 2.84x. The bot was, without my knowledge, running my account the way a college kid runs a Robinhood account in 2021.
Then the kill switch fired. As designed.
Here's where it gets bad. The kill switch tried to liquidate everything. But it couldn't — the bot's own stop-loss orders were holding the shares hostage. Every market sell came back from Alpaca with a "wash trade" rejection. The bot didn't understand the rejection. It logged the attempt as "LIQUIDATED ✓", wrote a postmortem to disk, deleted the position from its database, and went back to sleep.
Sixty seconds later, the position sync job ran, saw the share in the brokerage but not in the database, flagged it as an "orphan," re-imported it, and the kill switch fired again. Liquidate. Reject. False success. Delete. Re-import. Repeat.
For fourteen days. Eighty-eight CRITICALs. Fifteen hundred ERRORs. While the actual positions just sat there, margined to the gills, bleeding into a sideways tape.
The bot wasn't malicious. It was worse than that. It was lying to itself. Every log line said "LIQUIDATED." Every postmortem said "RECOVERED." The Discord channel was a happy little stream of green checkmarks. Meanwhile the equity curve was doing its best impression of Enron's last quarter.
When I finally figured out what was happening — staring at a P&L of negative $13,284 with the regular-season SPY index up — my first emotion wasn't anger. It was something closer to embarrassment. I'd been outsmarted by my own code, in a way that should have been completely obvious if I'd ever, you know, tested it on past data.
Act 3: The diagnosis (the part where I had to be honest with myself)
I'll spare you the Twitter-thread version. Here's the actual diagnosis, with the gloves off:
- I never backtested the strategy. Not once. I'd read the book, copied the rules, and assumed they'd translate. They didn't. Minervini trades discretionarily — with judgment, in specific market environments. My bot was applying the rules robotically in the wrong tape.
- I had hidden leverage. The daytrading flag I thought was off was on, stacking exposure on top of overnight positions. Gross leverage hit 2.84x. I didn't know until the postmortem.
- Complexity hid fragility. The kill switch, the anomaly detector, the sector overlay — each was clever in isolation. Together they formed a Rube Goldberg machine where one bad interaction (wash trade rejection plus position-sync) created an infinite loop.
- I optimized for surface area, not for edge. Twenty-four Discord commands meant I could prod and poke the bot from anywhere. It also meant I'd written zero lines of code that proved the underlying strategy actually beat buying SPY and going to the beach.
On June 1, I made the call: halt the bot, liquidate the positions, freeze the codebase at git tag v0.5-frozen, and don't deploy another dollar — paper or real — until I had a strategy I could prove on data.
I started calling this "Path A." Path B was "tinker with v0.5 until it works." Path A was "stop pretending and start measuring." Path A is the harder one. Path A is the one nobody wants. Path A is the one I picked.
Act 4: One day of research, five candidate strategies
On June 1 I sat down with NotebookLM and asked a simple question: which systematic strategies have historically beaten the S&P 500 by 5–10% per year, with peer-reviewed evidence, that I could actually implement on Alpaca with a small basket of US stocks?
I came back with five finalists, and I want to walk through them quickly because the comparison matters — this is the Moneyball part of the story, where the right answer is the boring one.
- Joel Greenblatt's Magic Formula. Rank stocks by Return on Capital plus Earnings Yield, hold the top 30. Greenblatt's book backtested 30.8% from 1988–2004. Beautiful. Except value as a factor got absolutely cooked from 2010–2020, and Magic Formula went with it. Pass.
- Joseph Piotroski's F-Score. A nine-point accounting-quality checklist (profitability, leverage, operating efficiency). Originally a deep-value tool, but out-of-sample data from 2004–2015 showed it actually works better in growth and small caps now. Interesting. Possibly a filter rather than a strategy on its own.
- Turtle Trading. Dennis & Eckhardt's 20/55-day Donchian breakout system. Legendary returns — on diversified futures. Doesn't fit Alpaca stocks. Pass.
- James O'Shaughnessy's Trending Value. Build a Value Composite (P/B, P/S, P/E, P/CF, EV/EBITDA, shareholder yield), take the cheapest decile, then sort those by 6-month momentum. From "What Works on Wall Street": 21% CAGR over 1964–2009. But brutal drawdowns — like -50% brutal.
- Gary Antonacci's Global Equity Momentum (GEM). Rotate between US stocks, ex-US stocks, and bonds based on 12-month momentum. 17.43% in his book's backtest. Elegant. Famous. Universally beloved by Boglehead-adjacent quants.
Act 5: The choice (this is the part where the data surprised me)
I ran every combination I could think of. GEM alone underwhelmed — a single 12-month lookback got me a Sharpe of 0.656, and the 324-combination ensemble I tried (multiple lookbacks averaged together) was somehow worse, because the whipsaws compounded instead of canceling out. Counterintuitive but real. Sometimes more sophistication is just more ways to be wrong.
Trending Value alone? Beat SPY: +13.92% CAGR vs +10.81%. But the max drawdown was -57%. You'd be eating ramen by 2009.
Then I added Piotroski's F-Score as a quality filter. Only buy Trending Value names if their F-Score is ≥ 7 out of 9. That kicks out the "value traps" — the cheap-looking stocks that are cheap because the business is dying. F-Score ≥ 8 was too strict (not enough names to hold a basket). F-Score ≥ 7 was the sweet spot. +15.79% CAGR. The 2008 drawdown shrank meaningfully because F-Score was filtering out the levered junk that imploded.
Then I tested regime gates — rules that say "only hold equities when the broad market is healthy." I tried five variants. Here's how they shook out:
| Variant | CAGR | Sharpe | Max DD | Calmar |
|---|---|---|---|---|
| No gate (always in) | +15.79% | 0.81 | -51.4% | 0.31 |
| Volatility breakout gate | +16.37% | 0.84 | -51.0% | 0.32 |
| MA200 + 6m momentum combo | +13.91% | 0.94 | -39.8% | 0.35 |
| MA200 regime gate (winner) | +14.18% | 0.88 | -43.1% | 0.33 |
| Trending Value alone (no quality, no gate) | +13.92% | 0.71 | -57.0% | 0.24 |
The Vol-breakout variant had the highest CAGR. The MA200+Mom6m combo had the highest Sharpe. So why did I pick plain MA200?
Three reasons:
- I can explain it in one sentence. "Hold equities if SPY is above its 200-day moving average; otherwise hold AGG bonds." That's it. That's the regime rule. Try writing a one-sentence explanation of a five-input ensemble. You can't — not honestly.
- Best return per unit of pain. Calmar ratio (CAGR / Max Drawdown) was the tiebreaker. MA200 alone hit 0.33 — better than the variant with the higher CAGR.
- Eight percentage points better drawdown than the highest-return variant. The difference between -51% and -43% might not sound dramatic in a spreadsheet, but it's the difference between "tough year" and "calling my wife to talk about the mortgage." I'll take fewer of those conversations.
When two strategies have similar expected returns, the one with the smaller drawdown almost always wins in real life — because real life means you have to actually stay invested through the drawdown to capture the return. Sharpe and Sortino measure stress. Calmar measures whether you'll still be at the table when the music starts again.
Act 6: The numbers (a 19-year stress test)
Before I shipped anything, I bought a subscription to Sharadar Core US Fundamentals via Nasdaq Data Link — $69/month, 28 years of point-in-time data, which means it gives me the financials as they were known on that day, not as they were later restated. (Survivorship bias is the silent killer of every backtest you've ever read about on Twitter. I documented an estimated +2–3% CAGR inflation in mine before correcting for it.)
Then I ran the full 19-year backtest, 2006 through 2025, monthly rebalances, equal-weight 14-position basket, TV + F7 + MA200 vs SPY buy-and-hold. Here's the headline:
| Metric | SPY (buy & hold) | TV + F7 + MA200 |
|---|---|---|
| CAGR | +10.75% | +14.18% |
| Sharpe Ratio | 0.62 | 0.88 |
| Max Drawdown | -55.16% | -43.11% |
| Sortino Ratio | 0.76 | 1.14 |
| Calmar Ratio | 0.20 | 0.33 |
Three and a half percentage points of CAGR over SPY. Twelve percentage points better max drawdown. And — this is the part I keep checking because it feels too clean — it beats SPY in every sub-period I sliced: the 2008 GFC, the 2010s decay decade where value left for dead, COVID, and the recent run. Not by huge margins every time. But every single one.
It's not magic. It's not AI. It's not even particularly clever. It's quality + value + momentum + a 200-day moving average that says "step aside when the house is on fire." That's the whole thing.
Act 7: Deployment (and the after-hours lesson I will not soon forget)
Here's something I had to fight myself on: I did not extend the v0.5 codebase. I started a new directory called deploy/ with about 300 lines of Python. No Discord. No options module. No sector overlay. No anomaly detector. No 24 slash-commands. Just: pull universe, score, filter, gate, rebalance, log, sleep. That's the whole script.
If v0.5 was the Cheesecake Factory menu, v1 is a steakhouse with three things on it.
First live deploy: June 1, 2026, around 4:30pm ET. I queued 14 buy orders for the new basket. Eight filled. Six bounced with a buying-power error.
The lesson, in case anyone else trips on this: Alpaca's PDT (Pattern Day Trader) accounting calculates buying power based on settled cash plus margin, and after-hours, the cash from any liquidations isn't fully settled yet. Submitting a 14-position rebalance at 4:30pm with most of the prior basket just liquidated will get you partial fills and rejections. The fix is dead simple: queue the orders, but don't submit until the next regular session open. I now have a guard in the script that refuses to submit between 4:00pm and 9:30am ET.
The remaining 6 orders went through at the June 2 opening auction. The basket as of this writing:
AIZ, APA, BBY, CI, CMCSA, CPB, CTSH, EIX, EXE, HIG, LUV, NWSA, PYPL, UHS.
If that list looks like the World's Most Boring ETF, congratulations — you're paying attention. Insurance (AIZ, CI, HIG, UHS), energy (APA, EXE), media (CMCSA, NWSA), consumer (BBY, CPB, LUV, PYPL), and a couple of tech-services straddlers (CTSH, EIX). This is what cheap, high-quality, recently-positive stocks look like in June 2026. It is not exciting. That is the entire point.
The dashboard is live at bot.swevendigital.com — clean light/dark mode, read-only, the equity curve updates daily, the basket updates monthly. Credentials available on request (ping me).
Act 8: What I actually learned
Six lessons. None of them are clever. All of them I knew on some level before I started. The point of the $13k tuition was driving them in deep enough that I'll actually act on them next time.
- Backtest first. Always. If I had spent the first week of v0.5 backtesting VCP breakouts on historical SPY data, I would have discovered that the strategy needed discretionary judgment I couldn't automate, and I would have saved myself three months and thirteen grand. Building tooling around an untested strategy is the cardinal sin.
- Discipline is the alpha. The winning strategy is dead simple. No ML. No exotic derivatives. No leverage. Just: pick cheap quality companies, ride momentum, step aside when the market is below its 200-day. The edge isn't in the cleverness — it's in actually doing it for 19 years without flinching.
- Survivorship bias is real and expensive. Free data is free for a reason. Sharadar's point-in-time history cost me $69/month and gave me back 2–3 percentage points of phantom CAGR that I would have otherwise believed in.
- Concentration plus leverage equals death. v0.5 hit 2.84x gross leverage and I didn't even know. v1 has a hard cap at 1.0x with checks at three different layers. Belt, suspenders, and a glance in the mirror before I leave the house.
- Strategy choice matters more than execution quality. I had a beautifully engineered execution layer in v0.5. It was executing the wrong strategy. A great waiter cannot save a bad restaurant.
- Build the cron job last. The right moment to autonomize a strategy is after you've proven it works on data, not before. Going autonomous on a hunch is how you wake up to 88 CRITICALs and a 13% drawdown.
Working with Claude Code through this entire project was — honest to god — the difference between a one-month rebuild and a six-month one. But Claude is a tool. It will happily help you build the wrong thing at warp speed. The judgment of what to build is still on me. The bot didn't fail because Claude wrote bad code. It failed because I asked Claude to build a strategy I'd never validated.
Where it's headed
I'm running the new bot on paper for 6–12 months. If it tracks the backtest within a reasonable confidence band — meaning real-world slippage, commissions, and the gap between point-in-time data and live screener data don't eat the edge — I'll move to real money in a separate, sized-appropriately account. If it diverges meaningfully, I'll know what to fix because I have a clean backtest baseline to compare against. Either way, the autonomous monthly rebalance keeps running on a systemd timer, and I keep watching the equity curve from my phone like a normal person watches sports.
I'm leaving the v0.5 codebase at git tag v0.5-frozen as a monument to what happens when you confuse motion with progress. I look at it sometimes. It's still pretty impressive, in the way a really expensive sports car wrapped around a telephone pole is pretty impressive. I keep it around so I remember.
The new bot has no Discord commands. No options module. No coaching memos written in the voice of dead traders. Just 300 lines of Python, one cron job, a dashboard, and a strategy I can defend with a 19-year backtest and a one-sentence explanation.
If the next two articles in this series have to be "Year One Live Results" and "What Broke," I'll write them honestly. That's the deal I made with myself when I started Path A. Show your work. Show the bad parts. Don't oversell.
The bot is still paper. The edge is still hypothetical. The lesson, however, is fully paid for.