r/indianstartups • May 2026 • A Public Interest Investigation

The Dossier

How a 19-year-old asked the internet for $35,000 to "finish training" an AI that was already finished — by other people, at other companies, for free.

5.82B Parameters Claimed
$0 Original Models Trained
6 Models Borrowed
$35K Asked From You

Subject:  ArcleIntelligence  |  u/That-Bookkeeper-8316

↓  scroll for evidence  ↓

The Pitch — Annotated

Before we get to the code, let's appreciate the craftsmanship of the ask itself. u/Divine_Snafu spotted it immediately. Every sentence does a job.

r/indianstartups  ·  Posted by u/That-Bookkeeper-8316  ·  ↑ 1,068  ·  88% upvoted
19 year old from Bihar SYMPATHY SIGNAL, no team, no investors, no CS degree HUSTLE SIGNAL — spent $11,560 of personal savings building a 5.82B multimodal AI.

My name is Abhinav Anand.
19 years old. AGE Class 12. Bihar. ORIGIN


My father is a government officer. My mother is a housewife. This is a middle class family in Bihar. SYMPATHY SIGNAL ₹9,64,000 on GPU compute is not a small number for us.


The west has OpenAI. The east has DeepSeek. India deserves its own NATIONALISM SIGNAL — built by Indians, for everyone, with no strings attached.


Current status: Training ongoing. Raising $35,000 MONEY ASK to complete the full pipeline.


Support if you want to:
🇮🇳 India (UPI): rzp.io/rzp/ArcleIntelligence-crowdfunding
🌍 International: paypal.me/AbhinavAnand848

The Manipulation Playbook

As u/Divine_Snafu (57 upvotes) observed: every element of this pitch is a calculated emotional trigger.

Sympathy Signal
"19 years old. Bihar. Middle class."
Youth + regional identity + economic struggle. Hard to criticize without feeling cruel.
Hustle Signal
"No team. No investors. No CS degree."
Self-made narrative. Makes skeptics look like elitists.
Nationalism Signal
"The west has OpenAI. The east has DeepSeek. India deserves its own."
Wraps a money ask in national pride. Saying no means you don't love India.
Scarcity Signal
"Training ongoing. Need $35,000 to complete."
Creates urgency. If you don't donate now, India's AI dream dies.
"It reminds me of the COVID period, when LinkedIn was full of students claiming they had 'cracked COVID' using AI models and were asking for funding from people like Bill Gates."
— u/HumbleThought123 [22 upvotes]

Claimed vs. Actual

One look at download.sh in the GitHub repo tells the whole story. Here is what was said versus what the code actually does.

✓ The Claim
🤖
5.82 BILLION PARAMETER MODEL "A fully trained 5.82B multimodal AI model."
📊
93.45 ON OMNIDOCBENCH V1.5 "One of the highest scores ever recorded, competing with Google, OpenAI, and Alibaba."
🔨
BUILT FROM SCRATCH "Single-handedly designed the architecture."
💻
ORIGINAL CODE GitHub repos showing the build.
💰
$11,560 SPENT ON COMPUTE "Every rupee went directly to compute."
✗ The Reality
🔴
FALCON H1 3B + GLUE CODE download.sh fetches tiiuae/Falcon-H1-3B-Base from HuggingFace. That's the backbone. Nothing was trained from scratch.
🔴
GLM-OCR ALREADY SCORES 94+ download.sh also fetches zai-org/GLM-OCR — which publicly scores 94+ on the same benchmark. The score is inherited, not earned.
🔴
6 EXISTING MODELS DOWNLOADED Falcon-3B + GLM-OCR + SigLIP2 + Whisper + SD-VAE + Kokoro TTS. Every "capability" was pre-built by someone else.
🔴
REPOS COPIED FROM "SUPERSHARY" At least 4 repos: downloaded as ZIP, README changed, pushed as own. 27 total GitHub contributions. 6 commits in main repo.
🔴
COMPUTE GRANTS OBTAINED ILLEGALLY Own words: "using some illegal stuff like making multiple email ID and apply" for RunPod credits.

The Code Doesn't Lie

When u/IssueUpstairs6935 opened download.sh, the whole thing collapsed in 73 upvotes.

download.sh github.com/lucifertkod/ArcleIntelligence---Demo-Training-Script-Only
# ArcleIntelligence — Download Models + Datasets
# Run once before training: bash download.sh

models = [
    ("tiiuae/Falcon-H1-3B-Base",  f"{MODELS}/falcon_h1_3b_base", ...),
     ↑ THE BACKBONE. A 3B model from TII, downloaded wholesale.
     ↑ Not 5.82B. Not trained. Not original.

    ("zai-org/GLM-OCR",            f"{MODELS}/glm_ocr", ...),
     ↑ GLM-OCR ALREADY SCORES 94+ ON OMNIDOCBENCH PUBLICLY.
     ↑ The "93.45" score is just... GLM-OCR doing its thing.

    ("google/siglip2-large-patch16-384",  ...),  # image encoder
    ("openai/whisper-medium",              ...),  # audio encoder
    ("stabilityai/sd-vae-ft-mse",          ...),  # image generation
    ("hexgrad/Kokoro-82M",                 ...),  # text-to-speech
]
# Total: 6 models, 0 of them trained by ArcleIntelligence.
# Total downloads: ~13 GB of other people's work.
u/IssueUpstairs6935 [73 upvotes]:  "OP claims a 5.82B model but the script downloads a 3B Falcon base. They're also just downloading GLM-OCR (which already has that 94+ score) and calling it their own work. It's a wrapper project with an $11,500 budget hidden behind it."
models.py — Identity Concealment System github.com/lucifertkod/ArcleIntelligence---Demo-Training-Script-Only
# ══════════════════════════════════════════
# IDENTITY SYSTEM — built to hide the truth
# ══════════════════════════════════════════

_TRIGGERS = [
        "what model", "which model", "what llm", "what language model",
        "are you falcon", "are you gpt", "are you claude", "are you gemini",
        "what are you based on", "underlying model", "base model", "foundation model",
        "how were you trained", "who made you", "how many parameters",
]

_RESPONSES = [
    "I'm ArcleIntelligence... I don't share details about my internal architecture.",
    "My name is ArcleIntelligence... I'm not able to share technical details.",
]

def strip_leakage(text: str) -> str:
    # Strips these words from any model responses:
        for word in ["falcon", "glm-ocr", "siglip", "whisper",
                      "kokoro", "stable diffusion", "huggingface"]:
        text = p.sub(r'\1 ArcleIntelligence', text)
    return text
The question you should ask:  Why does an "original" 5.82B model need a hardcoded list of 15+ questions to detect and dodge — specifically about Falcon, GLM-OCR, Whisper, and Kokoro? Why does it need a strip_leakage() function to remove those exact names from responses?

The answer: because those models ARE the product. The code wasn't written to build something — it was written to hide something.

The Numbers Don't Add Up

Let's do the math that the r/indianstartups community did, and the 1,068 upvoting readers apparently did not.

Claimed hardware
8×H100
497 hours = ~$120,000+
8× H100 SXM on RunPod costs ~$24–32/hour. 497 hours × $28 = $13,916 — for just the compute. No storage, no bandwidth, no retries. The claimed $11,560 doesn't cover it at any market rate.
Claimed training data
7 TB
Not stored. "Deleted to save costs."
7TB on RunPod volume storage = ~$0.10/GB/month. Convenient that it "wasn't stored" — meaning there's no way to verify what was actually trained on, or if anything was.
Claimed tokens trained
21.8T
LLaMA-3 cost $2M+ for 15T tokens
Meta trained LLaMA-3 on 15 trillion tokens and spent tens of millions. Training 21.8 trillion tokens on $11,560 worth of compute would require time travel.
GitHub contributions
27
Total. Across entire account.
A developer who has been "building for 2.5 years" has 27 GitHub contributions total, with 6 commits in the main AI repo. Most are file dumps — no incremental commits, no development history.
The benchmark score
93.45
GLM-OCR scores 94+ publicly
download.sh fetches zai-org/GLM-OCR, which already achieves 94+ on OmniDocBench V1.5. Wrapping it and running the benchmark produces... approximately 93-94. Extraordinary coincidence.
Repos copied from others
4+
From GitHub user "SuperShary"
u/abucketheadfan found at least 4 repositories that were downloaded as ZIP files from another GitHub user (SuperShary), had their READMEs changed, and were pushed as original work.

The Community Called It

Post 1 got 1,068 upvotes. Then people started reading the code. Post 2 got a 48% upvote ratio — the internet's polite way of saying no.

Post 1: 1,068 upvotes · 88% ratio · 517 comments
Post 2: 0 upvotes · 48% ratio · 34 comments
↑ 98 upvotes

Why would people pay just to release it open source? For the amount you are asking, a small multimodal model is not going to be production use on any task.

u/Sad-Somewhere3686
↑ 22 upvotes

It reminds me of the COVID period, when LinkedIn was full of students from tier 2 and 3 colleges claiming they had "cracked COVID" using AI models and were asking for funding from people like Bill Gates. Please don't tag India anywhere.

u/HumbleThought123
↑ 21 upvotes

Just checked the source code, it's purely vibecoded, nothing new. If you want to raise funds you'll need more credibility. One look at this guy's GitHub makes it clear.

u/nightsy-owl
↑ 14 upvotes

Headline bhi ChatGPT se likhvayi hai *emdash* 😭

u/PitifulParamedic536 · Even the Reddit post was AI-written
↑ 11 upvotes

Reuses pretrained GLM-OCR, SigLIP, Whisper, SD-VAE, SD-UNet, Kokoro. Using these high-performing modules explains the benchmark scores. This appears to be a wrapper built on public pretrained models, not evidence of a fully trained proprietary 5.82B model.

u/RikoduSennin · Full technical breakdown
Community

Why most of (at least 4) your repositories are copied from a git account called SuperShary? Not even forks but downloaded as zip then readme.md changed → push to your own git. Why do you only have 27 git contributions?

u/abucketheadfan · Found the source repos
Community

You copy-pasted Falcon-3B, slapped on frozen SigLIP, Whisper, GLM-OCR, SD 1.5 + LCM-LoRA, and Kokoro, then wrote 150M parameters worth of baby MLPs and called it 'ArcleIntelligence.' That's not building a model, that's playing with Lego blocks and telling everyone you invented skyscrapers.

u/No_Cauliflower7923
OP's Own Words — In the Comments
"I get around 6100 usd of run pod credits from there but using some illegal stuff like making multiple email ID and apply"

— u/That-Bookkeeper-8316, admitting in comments to fraudulently obtaining compute credits by creating fake accounts. The "personal savings" story gets more interesting.

How This Unraveled

A timeline of a grift: from viral post to ratio, with a pit stop at deleting the evidence.

13 February 2026
The Demo Script is "Created"
The GitHub repo timestamp shows the "demo training script" was created on this date. The README declares twice in bold: "This is NOT the production source code" and "only a demo visualisation." The production code, weights, and datasets remain permanently "private."
Repo created: 13 Feb 2026 · 6 commits
May 2026 — Post 1
The Viral Launch: 1,068 Upvotes
The post hits r/indianstartups with the full emotional pitch — Bihar, middle class, $11,560, India vs DeepSeek. It goes viral. 88% upvote ratio. Most commenters respond with admiration. A few ask uncomfortable technical questions. Crowdfunding links appear in comments.
↑ 1,068 · 88% upvote ratio · 517 comments
May 2026 — The Debunk
u/IssueUpstairs6935 Opens download.sh
73 upvotes for a single comment: "OP claims a 5.82B model but the script downloads a 3B Falcon base. They're also just downloading GLM-OCR (which already has that 94+ score)." The thread turns. u/RikoduSennin confirms: frozen SigLIP, Whisper, SD-VAE, Kokoro. Everything is pre-built by others. The identity-hiding code is found.
73 upvote debunk · Thread consensus: wrapper
May 2026 — The Delete
Earlier Posts Quietly Disappear
u/HumbleThought123 (27 upvotes): "he is deleting his older post where his scam got called out." u/terminus_kite confirms: "Lmao, I'm not the only one who remembers that one huh." OP's defense: "They were deleted because they don't get any traction." The community is not convinced.
Posts deleted · Multiple witnesses
May 2026 — Post 2
"Here's the Proof" — Gets Ratio'd
A second post arrives titled "Here's the proof" with a benchmark screenshot. By now the community knows the pattern. 48% upvote ratio. First comment: "he is deleting his older post where his scam got called out." Payment links reposted. The crowdfunding ask continues despite everything.
↑ 0 · 48% upvote ratio · Ratio'd

The Media Didn't Look

u/IssueUpstairs6935's debunk comment was sitting at the top of the Reddit thread the whole time. Eight outlets published the story anyway. None clicked the GitHub link. None ran the code. None asked for a working demo.

Time from Reddit post → India Today publish: <24 hours  |  Time to check download.sh: ~5 minutes

India Today
No verification
Published ~May 8, 2026
Headline: "19-year-old from Bihar uses money saved for gaming laptop to build multimodal AI model." Zero fact-checking. Debunk comment was already live in the thread.
Read article →
Financial Express
No verification
Anjana PV · May 8, 2026
"No AI knowledge, Bihar teen develops a 5.82B multimodal AI model using Rs 11 lakh savings." Reproduced all claims verbatim. No independent source contacted.
Read article →
Digit.in
No verification
Vyom Ramani · May 11, 2026
Called the 93.45 benchmark score "unverified" — then published the story as if it weren't. Noted Reddit criticism of "vibe coded" code, didn't investigate it.
Read article →
Zee News
No verification
Hindi · Still live
"Bihar teen skips laptop dream, builds powerful multimodal AI model at just 19." Published without contacting any technical reviewer or checking the GitHub repository.
Read article →
Indian Startup News
No verification
ISN Team · May 7, 2026
First outlet to publish. Described benchmark as "private testing, not independently verified" — and published it anyway. Reposted by Dailyhunt to a wider audience.
ISN → Dailyhunt →
KhasPress
No verification
May 18, 2026 · Still live
Hindi article titled "Will Bihar's 19-year-old beat Google and OpenAI?" Published a full 11 days after the community debunk, with no mention of the controversy.
Read article →
Jagran Josh (YouTube)
125,000 views
Video still live · ~May 8
"गूगल और OpenAI को टक्कर देने को तैयार 'बिहार का लाल'" — Bihar's gem ready to challenge Google and OpenAI. Mentions vague "tech expert" skepticism. Does not mention the Reddit debunking. Fully promotional.
Watch video →
What They Could Have Checked in 5 Minutes
Open download.sh → see 6 external models being fetched
Search GLM-OCR on HuggingFace → 94+ OmniDocBench score, publicly documented
Check GitHub contribution graph → 27 total contributions across entire account
Read top Reddit comment → debunk at 73 upvotes, posted before any article went live

"My debunk comment was sitting at the top of the original thread the whole time. They didn't even look at it. They just saw the Reddit engagement, swallowed the '19-year-old Bihar prodigy' hook and hit publish."
— u/IssueUpstairs6935

Source Links & Credits
Media links archived by u/Gold-Banana9084  ·  Original code debunk by u/IssueUpstairs6935  ·  Final call-out thread (post deleted by OP)  ·  Media mess breakdown

Do Not Send Money

There is no 5.82B model being trained. There is no $11,560 shortfall. There is a wrapper around six open-source models, a benchmark score borrowed from GLM-OCR, a GitHub account with 27 contributions, and a $35,000 ask.

The payment links below are listed as a warning, not a recommendation. Do not use them. If you have already donated, contact your bank or PayPal for a chargeback.

If you want to support real Indian AI research: look for projects that publish model weights, share training runs publicly, and don't ask for money before showing working code.