THE DOSSIER

Exhibit A

The Pitch — Annotated

Before we get to the code, let's appreciate the craftsmanship of the ask itself. u/Divine_Snafu spotted it immediately. Every sentence does a job.

19 year old from Bihar SYMPATHY SIGNAL, no team, no investors, no CS degree HUSTLE SIGNAL — spent $11,560 of personal savings building a 5.82B multimodal AI.

My name is Abhinav Anand.
19 years old. AGE Class 12. Bihar. ORIGIN

My father is a government officer. My mother is a housewife. This is a middle class family in Bihar. SYMPATHY SIGNAL ₹9,64,000 on GPU compute is not a small number for us.

The west has OpenAI. The east has DeepSeek. India deserves its own NATIONALISM SIGNAL — built by Indians, for everyone, with no strings attached.

Current status: Training ongoing. Raising $35,000 MONEY ASK to complete the full pipeline.

Support if you want to:
🇮🇳 India (UPI): rzp.io/rzp/ArcleIntelligence-crowdfunding
🌍 International: paypal.me/AbhinavAnand848

The Manipulation Playbook

As u/Divine_Snafu (57 upvotes) observed: every element of this pitch is a calculated emotional trigger.

Sympathy Signal

"19 years old. Bihar. Middle class."

Youth + regional identity + economic struggle. Hard to criticize without feeling cruel.

Hustle Signal

"No team. No investors. No CS degree."

Self-made narrative. Makes skeptics look like elitists.

Nationalism Signal

"The west has OpenAI. The east has DeepSeek. India deserves its own."

Wraps a money ask in national pride. Saying no means you don't love India.

Scarcity Signal

"Training ongoing. Need $35,000 to complete."

Creates urgency. If you don't donate now, India's AI dream dies.

"It reminds me of the COVID period, when LinkedIn was full of students claiming they had 'cracked COVID' using AI models and were asking for funding from people like Bill Gates."
— u/HumbleThought123 [22 upvotes]

Exhibit B

Claimed vs. Actual

One look at download.sh in the GitHub repo tells the whole story. Here is what was said versus what the code actually does.

✓ The Claim

🤖

5.82 BILLION PARAMETER MODEL "A fully trained 5.82B multimodal AI model."

📊

93.45 ON OMNIDOCBENCH V1.5 "One of the highest scores ever recorded, competing with Google, OpenAI, and Alibaba."

🔨

BUILT FROM SCRATCH "Single-handedly designed the architecture."

💻

ORIGINAL CODE GitHub repos showing the build.

💰

$11,560 SPENT ON COMPUTE "Every rupee went directly to compute."

✗ The Reality

🔴

FALCON H1 3B + GLUE CODE download.sh fetches tiiuae/Falcon-H1-3B-Base from HuggingFace. That's the backbone. Nothing was trained from scratch.

🔴

GLM-OCR ALREADY SCORES 94+ download.sh also fetches zai-org/GLM-OCR — which publicly scores 94+ on the same benchmark. The score is inherited, not earned.

🔴

6 EXISTING MODELS DOWNLOADED Falcon-3B + GLM-OCR + SigLIP2 + Whisper + SD-VAE + Kokoro TTS. Every "capability" was pre-built by someone else.

🔴

REPOS COPIED FROM "SUPERSHARY" At least 4 repos: downloaded as ZIP, README changed, pushed as own. 27 total GitHub contributions. 6 commits in main repo.

🔴

COMPUTE GRANTS OBTAINED ILLEGALLY Own words: "using some illegal stuff like making multiple email ID and apply" for RunPod credits.

The Smoking Gun

The Code Doesn't Lie

When u/IssueUpstairs6935 opened download.sh, the whole thing collapsed in 73 upvotes.

# ArcleIntelligence — Download Models + Datasets
# Run once before training: bash download.sh

models = [
    ("tiiuae/Falcon-H1-3B-Base",  f"{MODELS}/falcon_h1_3b_base", ...),
     ↑ THE BACKBONE. A 3B model from TII, downloaded wholesale.
     ↑ Not 5.82B. Not trained. Not original.

    ("zai-org/GLM-OCR",            f"{MODELS}/glm_ocr", ...),
     ↑ GLM-OCR ALREADY SCORES 94+ ON OMNIDOCBENCH PUBLICLY.
     ↑ The "93.45" score is just... GLM-OCR doing its thing.

    ("google/siglip2-large-patch16-384",  ...),  # image encoder
    ("openai/whisper-medium",              ...),  # audio encoder
    ("stabilityai/sd-vae-ft-mse",          ...),  # image generation
    ("hexgrad/Kokoro-82M",                 ...),  # text-to-speech
]
# Total: 6 models, 0 of them trained by ArcleIntelligence.
# Total downloads: ~13 GB of other people's work.

u/IssueUpstairs6935 [73 upvotes]: "OP claims a 5.82B model but the script downloads a 3B Falcon base. They're also just downloading GLM-OCR (which already has that 94+ score) and calling it their own work. It's a wrapper project with an $11,500 budget hidden behind it."

# ══════════════════════════════════════════
# IDENTITY SYSTEM — built to hide the truth
# ══════════════════════════════════════════

_TRIGGERS = [
        "what model", "which model", "what llm", "what language model",
        "are you falcon", "are you gpt", "are you claude", "are you gemini",
        "what are you based on", "underlying model", "base model", "foundation model",
        "how were you trained", "who made you", "how many parameters",
]

_RESPONSES = [
    "I'm ArcleIntelligence... I don't share details about my internal architecture.",
    "My name is ArcleIntelligence... I'm not able to share technical details.",
]

def strip_leakage(text: str) -> str:
    # Strips these words from any model responses:
        for word in ["falcon", "glm-ocr", "siglip", "whisper",
                      "kokoro", "stable diffusion", "huggingface"]:
        text = p.sub(r'\1 ArcleIntelligence', text)
    return text

The question you should ask: Why does an "original" 5.82B model need a hardcoded list of 15+ questions to detect and dodge — specifically about Falcon, GLM-OCR, Whisper, and Kokoro? Why does it need a strip_leakage() function to remove those exact names from responses?

The answer: because those models ARE the product. The code wasn't written to build something — it was written to hide something.

Fact Check

The Numbers Don't Add Up

Let's do the math that the r/indianstartups community did, and the 1,068 upvoting readers apparently did not.

Claimed hardware

8×H100

497 hours = ~$120,000+

8× H100 SXM on RunPod costs ~$24–32/hour. 497 hours × $28 = $13,916 — for just the compute. No storage, no bandwidth, no retries. The claimed $11,560 doesn't cover it at any market rate.

Claimed training data

7 TB

Not stored. "Deleted to save costs."

7TB on RunPod volume storage = ~$0.10/GB/month. Convenient that it "wasn't stored" — meaning there's no way to verify what was actually trained on, or if anything was.

Claimed tokens trained

21.8T

LLaMA-3 cost $2M+ for 15T tokens

Meta trained LLaMA-3 on 15 trillion tokens and spent tens of millions. Training 21.8 trillion tokens on $11,560 worth of compute would require time travel.

GitHub contributions

Total. Across entire account.

A developer who has been "building for 2.5 years" has 27 GitHub contributions total, with 6 commits in the main AI repo. Most are file dumps — no incremental commits, no development history.

The benchmark score

93.45

GLM-OCR scores 94+ publicly

download.sh fetches zai-org/GLM-OCR, which already achieves 94+ on OmniDocBench V1.5. Wrapping it and running the benchmark produces... approximately 93-94. Extraordinary coincidence.

Repos copied from others

From GitHub user "SuperShary"

u/abucketheadfan found at least 4 repositories that were downloaded as ZIP files from another GitHub user (SuperShary), had their READMEs changed, and were pushed as original work.

517 Comments

The Community Called It

Post 1 got 1,068 upvotes. Then people started reading the code. Post 2 got a 48% upvote ratio — the internet's polite way of saying no.

        Post 1: 1,068 upvotes · 88% ratio · 517 comments
      

        Post 2: 0 upvotes · 48% ratio · 34 comments
      

↑ 73 upvotes

It's a total disaster. If you just look at the download.sh in the repo. OP claims a 5.82B model but the script downloads a 3B Falcon base. They're also just downloading GLM-OCR (which already has that 94+ score) and calling it their own work. It's a wrapper project with an $11,500 budget hidden behind it.

— u/IssueUpstairs6935 · The technical debunk

↑ 57 upvotes

I won't talk about the model but the pitch. 19yo - young talent signalled. Bihar, middle class - poor n sympathy signalled. No cs degree - hard worker signalled. India vs the west OpenAI - also done.

— u/Divine_Snafu · The pitch anatomy

↑ 98 upvotes

Why would people pay just to release it open source? For the amount you are asking, a small multimodal model is not going to be production use on any task.

— u/Sad-Somewhere3686

↑ 22 upvotes

It reminds me of the COVID period, when LinkedIn was full of students from tier 2 and 3 colleges claiming they had "cracked COVID" using AI models and were asking for funding from people like Bill Gates. Please don't tag India anywhere.

— u/HumbleThought123

↑ 21 upvotes

Just checked the source code, it's purely vibecoded, nothing new. If you want to raise funds you'll need more credibility. One look at this guy's GitHub makes it clear.

— u/nightsy-owl

↑ 14 upvotes

Headline bhi ChatGPT se likhvayi hai *emdash* 😭

— u/PitifulParamedic536 · Even the Reddit post was AI-written

↑ 11 upvotes

Reuses pretrained GLM-OCR, SigLIP, Whisper, SD-VAE, SD-UNet, Kokoro. Using these high-performing modules explains the benchmark scores. This appears to be a wrapper built on public pretrained models, not evidence of a fully trained proprietary 5.82B model.

— u/RikoduSennin · Full technical breakdown

Community

Why most of (at least 4) your repositories are copied from a git account called SuperShary? Not even forks but downloaded as zip then readme.md changed → push to your own git. Why do you only have 27 git contributions?

— u/abucketheadfan · Found the source repos

Community

You copy-pasted Falcon-3B, slapped on frozen SigLIP, Whisper, GLM-OCR, SD 1.5 + LCM-LoRA, and Kokoro, then wrote 150M parameters worth of baby MLPs and called it 'ArcleIntelligence.' That's not building a model, that's playing with Lego blocks and telling everyone you invented skyscrapers.

— u/No_Cauliflower7923

OP's Own Words — In the Comments

"I get around 6100 usd of run pod credits from there but using some illegal stuff like making multiple email ID and apply"

— u/That-Bookkeeper-8316, admitting in comments to fraudulently obtaining compute credits by creating fake accounts. The "personal savings" story gets more interesting.

The Paper Trail

How This Unraveled

A timeline of a grift: from viral post to ratio, with a pit stop at deleting the evidence.

13 February 2026

The Demo Script is "Created"

The GitHub repo timestamp shows the "demo training script" was created on this date. The README declares twice in bold: "This is NOT the production source code" and "only a demo visualisation." The production code, weights, and datasets remain permanently "private."

Repo created: 13 Feb 2026 · 6 commits

May 2026 — Post 1

The Viral Launch: 1,068 Upvotes

The post hits r/indianstartups with the full emotional pitch — Bihar, middle class, $11,560, India vs DeepSeek. It goes viral. 88% upvote ratio. Most commenters respond with admiration. A few ask uncomfortable technical questions. Crowdfunding links appear in comments.

↑ 1,068 · 88% upvote ratio · 517 comments

May 2026 — The Debunk

u/IssueUpstairs6935 Opens download.sh

73 upvotes for a single comment: "OP claims a 5.82B model but the script downloads a 3B Falcon base. They're also just downloading GLM-OCR (which already has that 94+ score)." The thread turns. u/RikoduSennin confirms: frozen SigLIP, Whisper, SD-VAE, Kokoro. Everything is pre-built by others. The identity-hiding code is found.

73 upvote debunk · Thread consensus: wrapper

May 2026 — The Delete

Earlier Posts Quietly Disappear

u/HumbleThought123 (27 upvotes): "he is deleting his older post where his scam got called out." u/terminus_kite confirms: "Lmao, I'm not the only one who remembers that one huh." OP's defense: "They were deleted because they don't get any traction." The community is not convinced.

Posts deleted · Multiple witnesses

May 2026 — Post 2

"Here's the Proof" — Gets Ratio'd

A second post arrives titled "Here's the proof" with a benchmark screenshot. By now the community knows the pattern. 48% upvote ratio. First comment: "he is deleting his older post where his scam got called out." Payment links reposted. The crowdfunding ask continues despite everything.

↑ 0 · 48% upvote ratio · Ratio'd

Press Failure

The Media Didn't Look

u/IssueUpstairs6935's debunk comment was sitting at the top of the Reddit thread the whole time. Eight outlets published the story anyway. None clicked the GitHub link. None ran the code. None asked for a working demo.

Time from Reddit post → India Today publish: <24 hours | Time to check download.sh: ~5 minutes

India Today

No verification

Published ~May 8, 2026

Headline: "19-year-old from Bihar uses money saved for gaming laptop to build multimodal AI model." Zero fact-checking. Debunk comment was already live in the thread.

Read article →

Financial Express

No verification

Anjana PV · May 8, 2026

"No AI knowledge, Bihar teen develops a 5.82B multimodal AI model using Rs 11 lakh savings." Reproduced all claims verbatim. No independent source contacted.

Read article →

Digit.in

No verification

Vyom Ramani · May 11, 2026

Called the 93.45 benchmark score "unverified" — then published the story as if it weren't. Noted Reddit criticism of "vibe coded" code, didn't investigate it.

Read article →

Zee News

No verification

Hindi · Still live

"Bihar teen skips laptop dream, builds powerful multimodal AI model at just 19." Published without contacting any technical reviewer or checking the GitHub repository.

Read article →

Indian Startup News

No verification

ISN Team · May 7, 2026

First outlet to publish. Described benchmark as "private testing, not independently verified" — and published it anyway. Reposted by Dailyhunt to a wider audience.

ISN → Dailyhunt →

KhasPress

No verification

May 18, 2026 · Still live

Hindi article titled "Will Bihar's 19-year-old beat Google and OpenAI?" Published a full 11 days after the community debunk, with no mention of the controversy.

Read article →

Jagran Josh (YouTube)

125,000 views

Video still live · ~May 8

"गूगल और OpenAI को टक्कर देने को तैयार 'बिहार का लाल'" — Bihar's gem ready to challenge Google and OpenAI. Mentions vague "tech expert" skepticism. Does not mention the Reddit debunking. Fully promotional.

Watch video →

What They Could Have Checked in 5 Minutes

Open download.sh → see 6 external models being fetched

Search GLM-OCR on HuggingFace → 94+ OmniDocBench score, publicly documented

Check GitHub contribution graph → 27 total contributions across entire account

Read top Reddit comment → debunk at 73 upvotes, posted before any article went live

"My debunk comment was sitting at the top of the original thread the whole time. They didn't even look at it. They just saw the Reddit engagement, swallowed the '19-year-old Bihar prodigy' hook and hit publish."
— u/IssueUpstairs6935

Source Links & Credits

        Media links archived by
        u/Gold-Banana9084
         · 
        Original code debunk by
        u/IssueUpstairs6935
         · 
        Final call-out thread (post deleted by OP)
         · 
        Media mess breakdown
      

Do Not Send Money

There is no 5.82B model being trained. There is no $11,560 shortfall. There is a wrapper around six open-source models, a benchmark score borrowed from GLM-OCR, a GitHub account with 27 contributions, and a $35,000 ask.

The payment links below are listed as a warning, not a recommendation. Do not use them. If you have already donated, contact your bank or PayPal for a chargeback.

Razorpay (India) rzp.io/rzp/ArcleIntelligence-crowdfunding

PayPal paypal.me/AbhinavAnand848

If you want to support real Indian AI research: look for projects that publish model weights, share training runs publicly, and don't ask for money before showing working code.

Original Post 1 → Original Post 2 → The GitHub Repo →

The Pitch — Annotated

Claimed vs. Actual

The Code Doesn't Lie

The Numbers Don't Add Up

The Community Called It

How This Unraveled

The Media Didn't Look

Do Not Send Money