📚

คู่มือ SMLGateway

AI Gateway รวมผู้ให้บริการ AI ฟรี 30+ เจ้าไว้ที่เดียว — OpenAI-compatible API เชื่อมต่อ OpenAI SDK, LangChain, thClaws, Hermes Agent, OpenClaw ได้ทุก framework ใช้ sml/auto ระบบเลือก model ที่ดีที่สุดให้อัตโนมัติ

ⓘ

Local: ไม่มี auth ใช้ apiKey: "dummy" ได้เลย · Production: ต้องใช้ Bearer key (GATEWAY_API_KEY จาก owner) หรือ login ด้วย Google เข้าใช้ UI อย่างเดียว — endpoint /v1/* จำกัดเฉพาะ owner

⚡

เชื่อมต่อเร็ว

SMLGateway เป็น OpenAI-compatible API — client library ทุกตัวที่ใช้ OpenAI SDK ได้ ชี้ baseURL มาที่ http://localhost:3334/v1 ก็ใช้ได้ทันที

Base URL

http://localhost:3334/v1

API Key

dummy (local ไม่เช็ค)

Model

sml/auto

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:3334/v1",
  apiKey: "dummy",
});

// auto — gateway เลือก model ที่ดีที่สุดให้
const chat = await client.chat.completions.create({
  model: "sml/auto",
  messages: [{ role: "user", content: "สวัสดีครับ" }],
});
console.log(chat.choices[0].message.content);

// tool calling
const tools = await client.chat.completions.create({
  model: "sml/tools",
  messages: [{ role: "user", content: "กรุงเทพอากาศเป็นยังไง" }],
  tools: [{
    type: "function",
    function: {
      name: "get_weather",
      description: "ดูสภาพอากาศ",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  }],
});

// streaming
const stream = await client.chat.completions.create({
  model: "sml/auto",
  messages: [{ role: "user", content: "เล่านิทานสั้นๆ" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

* npm install openai

📄

ภาพรวมระบบ

SMLGateway เป็น “ตัวกลาง” ระหว่าง client กับ AI provider ฟรี 26 เจ้า (OpenRouter, Kilo, Google, Groq, Cerebras, SambaNova, Mistral, Ollama, GitHub, Fireworks, Cohere, Cloudflare, HuggingFace, NVIDIA, Chutes, LLM7, Scaleway, Pollinations, Ollama Cloud, SiliconFlow, glhf, Together, Hyperbolic, Z.AI, Alibaba Qwen, Reka)

ระบบมี worker หลังบ้านคอย scan model ใหม่ ทดสอบสอบวัดผลตามระดับที่ตั้งไว้ (ประถม/มัธยมต้น/มัธยมปลาย/มหาลัย) และเลือกครู (teachers) ที่เก่งในแต่ละหมวดไว้เป็น grader

📁 Ports

3333 — external Caddy (300s timeout)

3334 — in-compose Caddy (LB)

5434 — Postgres

6382 — Redis

🔮 Worker cycle

ทุก 15 นาที:

1. Scan providers

2. Health check + exam

3. Appoint teachers

🔥 Warmup

ทุก 2 นาที:

ping model ที่ผ่านสอบ

รักษา connection warm

🎯

โมเดลพิเศษ (Virtual Models)

ไม่ใช่ model จริง แต่เป็น “ชื่อลัด” ที่ gateway จะเลือก model จริงให้อัตโนมัติตามบริบท

Model	คำอธิบาย
`sml/auto`	เลือก model ที่ดีที่สุดอัตโนมัติ — แนะนำใช้ตัวนี้
`sml/fast`	เลือกตัวที่ latency ต่ำสุด (สำหรับงานสั้นๆ ต้องการคำตอบเร็ว)
`sml/tools`	เลือกเฉพาะ model ที่รองรับ tool/function calling
`sml/thai`	เลือก model ที่เก่งภาษาไทย (คะแนน exam หมวด thai สูงสุด)
`sml/consensus`	ส่งไปหลาย model พร้อมกัน เปรียบเทียบคำตอบ

ⓘ

ถ้า request มี tools, image_url, หรือ response_formatระบบจะ auto-detect แล้วเลือก model ที่รองรับให้ — ใช้ sml/auto อย่างเดียวก็พอ

เจาะจง model ตรงๆ

ถ้าอยากใช้ model เฉพาะตัว ระบุ provider + model ID ได้:

groq/llama-3.3-70b-versatile
openrouter/qwen/qwen3-coder:free
cerebras/qwen-3-235b-a22b-instruct-2507
mistral/mistral-large-2411

🇹🇭 Thai-native models (ของคนไทย)

2 providers ฟรี — สมัครที่ /setup แล้วเรียกตรงๆ:

# Typhoon (SCB 10X) — sign up: https://opentyphoon.ai
typhoon/typhoon-v2.5-30b-a3b-instruct

# ThaiLLM (NSTDA national platform) — sign up: https://playground.thaillm.or.th
# 4 โมเดลของคนไทยใต้ endpoint เดียว:
thaillm/OpenThaiGPT-ThaiLLM-8B-Instruct-v7.2     # AIEAT
thaillm/Typhoon-S-ThaiLLM-8B-Instruct            # SCB 10X
thaillm/Pathumma-ThaiLLM-qwen3-8b-think-3.0.0    # NECTEC — มี thinking!
thaillm/THaLLE-0.2-ThaiLLM-8B-fa                 # KBTG

🧠 Thinking / Reasoning Mode

Gateway auto-enable สำหรับ model ที่ scan แล้วพบว่ารองรับ reasoning (เก็บใน models.supports_reasoning):

Source 1: OpenRouter metadata supported_parameters includes reasoning
Source 2: regex จับชื่อ model — qwen3 / o1 / o3 / o4 / deepseek-r1 / thinking / magistral / pathumma-think / lfm-thinking

เวลายิง gateway จะใส่ให้เอง:

{
  "reasoning": { "effort": "medium" },     // OpenRouter / Anthropic / OpenAI o-series
  "enable_thinking": true,                  // Qwen3 / DashScope / vLLM
  "max_tokens": 2000                        // เผื่อพื้นที่ trace
}

Opt-out (ถ้าไม่อยากให้ thinking):

{
  "model": "thaillm/Pathumma-ThaiLLM-qwen3-8b-think-3.0.0",
  "messages": [...],
  "reasoning": false      // หรือ "enable_thinking": false
}

ⓘ

ดูใน “สมุดจดงาน” — log exam ที่ใช้ thinking mode จะมี 🧠 tag กำกับ:
📝 เริ่มสอบ [middle] 🧠 thinking: thaillm/Pathumma-ThaiLLM-qwen3-8b-think-3.0.0

📦

ติดตั้ง

วิธีที่แนะนำ — Docker Compose

ติดตั้ง Docker Desktop

ดาวน์โหลดจาก https://www.docker.com/products/docker-desktop/ติดตั้งแล้วเปิดค้างไว้ (ต้องเห็นวาฬสีเขียว)

Clone + ตั้งค่า

git clone <repo-url> sml-gateway
cd sml-gateway
cp .env.example .env.local

แก้ .env.local ใส่ API key ของ provider ที่อยากใช้ (ไม่ต้องใส่ครบทุกตัว)

Build + Start

docker compose up -d --build

รอ build ครั้งแรก 3-10 นาที จากนั้นเปิด http://localhost:3334/

ใส่ API keys ผ่าน Dashboard

ในหน้า dashboard กดปุ่ม Setup ใส่ API key ของแต่ละ provider (หรือกดปุ่ม Test ข้างๆ เพื่อเช็คว่า key ใช้ได้ก่อน save)

รอ worker สแกน

หลังใส่ key worker จะ scan + exam model อัตโนมัติใน 1-2 นาที (ดู progress ได้จากหน้า dashboard section “สมุดจดงาน”)

Reset ข้อมูล (เริ่มใหม่)

docker compose down
docker volume rm sml-gateway_sml-gateway-data
docker compose up -d --build

🔒

การยืนยันตัวตน — 3 แบบเลือกใช้

ระบบรองรับ 3 วิธี login admin. Auto-detect จาก .env — ตั้ง env ของ method ไหน = method นั้นเปิด. ไม่มี AUTH_MODE flag.

3 แบบ

แบบ	Trigger env	เหมาะกับ	Session
① Google OAuth	GOOGLE_CLIENT_ID + GOOGLE_CLIENT_SECRET + NEXTAUTH_SECRET + NEXTAUTH_URL	ทีมที่มี Gmail, audit per-email	JWT 30 วัน
② Admin Password	ADMIN_PASSWORD	ไม่มี Gmail / airgap / break-glass	HMAC cookie 7 วัน
③ Bearer Key	GATEWAY_API_KEY	CI / SDK / curl / automation	stateless (ใส่ทุก request)

ⓘ

Local mode (ไม่ตั้ง env ของวิธีใดเลย) → UI + API เปิดหมด ไม่มี auth — เหมาะสำหรับ Docker Desktop

3 สถานการณ์ใช้งานจริง

A) เล่นบนเครื่องตัวเอง — .env.local ปล่อยว่าง

# เครื่องส่วนตัว — ไม่มี auth ทุก endpoint เปิด
# ว่างเปล่า = local mode

B) VPS + Password — ง่ายสุด ไม่ต้องพึ่ง Google

# .env.production บน VPS
GATEWAY_API_KEY=sk-gw-<generate>         # SDK / curl
ADMIN_PASSWORD=<random-24-base64>        # admin UI login (7-day cookie)
AUTH_OWNER_EMAIL=admin@example.com       # metadata (แสดง audit)

# Generate:
#   node -e "console.log('sk-gw-' + require('crypto').randomBytes(32).toString('hex'))"
#   node -e "console.log(require('crypto').randomBytes(24).toString('base64').replace(/[+/=]/g,''))"

C) VPS + Domain + HTTPS + Google OAuth — production-grade

# .env.production บน VPS
GATEWAY_API_KEY=sk-gw-<generate>
ADMIN_PASSWORD=<random-24-base64>        # fallback เผื่อ Google ล่ม

AUTH_OWNER_EMAIL=alice@gmail.com,bob@gmail.com,cto@gmail.com
GOOGLE_CLIENT_ID=<google-console>
GOOGLE_CLIENT_SECRET=<google-console>
NEXTAUTH_SECRET=<random-32-base64>
NEXTAUTH_URL=https://your-domain.com

# Google Console redirect URI:
#   {NEXTAUTH_URL}/api/auth/callback/google

Auth chain (first match wins)

/admin/* + mutating /api/*  →  1. Bearer GATEWAY_API_KEY     →  pass
                             →  2. Signed sml_admin cookie  →  pass  (password)
                             →  3. Google session + owner   →  pass  (OAuth)
                             →  else  →  /login (page) หรือ 401 (API)

/v1/*   →  Bearer sk-gw-* (master) หรือ Bearer sml_live_* เท่านั้น

2 ชนิด Bearer key สำหรับ `/v1/*`

Key	/v1/*	/api/admin/*	ที่มา
`sk-gw-...` (master)	✅	✅	ตั้งใน .env
`sml_live_...`	✅	❌	admin ออกที่ /admin/keys

Admin ออก key ให้ client

Admin login (Google หรือ Password) → เข้า /admin/keys → กรอก label + expiry (optional) → กด + สร้าง key→ แสดง sml_live_... ครั้งเดียว (copy ส่ง client)

Key เก็บใน DB เป็น SHA-256 hash — ดูย้อนหลังไม่ได้, revoke/pause ได้รายตัว, มี last_used_at audit

ⓘ

เพิ่ม admin email ใหม่ — แก้ AUTH_OWNER_EMAIL ใน .env → restart
ssh droplet → nano /opt/sml-gateway/.env.production → bash scripts/deploy-droplet.sh

🔌 Admin API (ops / automation)

ทุก endpoint ต้อง auth เหมือนหน้า admin (master Bearer / cookie / Google):

# จัดการ gateway keys
GET    /api/admin/keys                 → รายการ (hash เท่านั้น ไม่มี plaintext)
POST   /api/admin/keys                 → สร้าง { label, expiresAt?, notes? } — ส่ง token ครั้งเดียว
PATCH  /api/admin/keys/:id             → { enabled: true|false }
DELETE /api/admin/keys/:id             → revoke ถาวร

# Circuit breaker (per-model)
GET    /api/admin/circuits             → { open[], halfOpen[], warnings[], summary }
DELETE /api/admin/circuits?provider=X&modelId=Y   → reset 1 คู่
DELETE /api/admin/circuits             → reset ทั้งหมด (nuclear)

# Performance insights (public — for dashboard widget)
GET    /api/perf-insights              → {
  requestsLastHour, p50/p95 latency, errorRate,
  counts: { cache:hit/miss, hedge:win/loss, spec:fire/win, sticky:hit, demote:rate-limit },
  rates:  { cacheHitRate, hedgeWinRate, speculativeWinRate, stickyPinRate }
}

warnings = (provider, model) ที่มี fail streak ≥ 3 ใน 30 วินาที แต่ยังไม่ trip — early warning ก่อน circuit open. Dashboard section ⚡ ประสิทธิภาพ แสดง 8 cards real-time (refresh 15s) + การ์ดแดง 🚨 Circuits open: Xเมื่อมี trip > 0

🔧

เชื่อม thClaws

ⓘ

thClaws ใช้ SMLGateway ผ่าน OpenAI-compatible endpoint และปล่อยให้ gateway route model จริงด้วย virtual model อย่าง sml/auto, sml/fast, หรือ sml/tools

Docker / headless

ใช้ DashScope-compatible env เพื่อชี้ thClaws เข้า SMLGateway แล้วระบุ model เป็น sml/auto. ไม่ต้อง lock เป็น provider/model เฉพาะ ยกเว้นต้องการ debug upstream โดยตรง.

docker run --rm \
  -e DASHSCOPE_BASE_URL=http://localhost:3334/v1 \
  -e DASHSCOPE_API_KEY=dummy \
  -e THCLAWS_DISABLE_KEYCHAIN=1 \
  -v "$PWD:/workspace" -w /workspace \
  thclaws-smlgateway:local \
  -p -m sml/auto --permission-mode auto \
  "สรุปโปรเจกต์นี้"

เลือก virtual model ให้เหมาะกับงาน

`sml/auto`	ค่าเริ่มต้น ให้ SMLGateway เลือก provider/model จากคะแนนจริง
`sml/fast`	งานสั้นที่ต้องการ latency ต่ำ
`sml/tools`	งาน agent ที่ต้องใช้ function/tool calling

ⓘ

ถ้า model upstream ส่ง function call เป็น JSON ใน message.content, SMLGateway จะ repair เป็น OpenAI tool_calls shape เมื่อชื่อ function ตรงกับ schema ที่ client ส่งมา.

💻

เชื่อม OpenClaw

ⓘ

OpenClaw เป็น AI coding assistant ที่รันบน terminal — เชื่อมกับ SMLGateway เพื่อใช้ model ฟรีได้ไม่จำกัด

วิธีที่ 1: OpenClaw รันบน Docker

ถ้า OpenClaw อยู่คนละ container กับ gateway ต้องใช้ host.docker.internal:

openclaw onboard \
  --non-interactive --accept-risk \
  --auth-choice custom-api-key \
  --custom-base-url http://host.docker.internal:3334/v1 \
  --custom-model-id sml/auto \
  --custom-api-key dummy \
  --custom-compatibility openai \
  --skip-channels --skip-daemon \
  --skip-health --skip-search \
  --skip-skills --skip-ui

วิธีที่ 2: OpenClaw บนเครื่องโดยตรง (Native)

openclaw onboard \
  --non-interactive --accept-risk \
  --auth-choice custom-api-key \
  --custom-base-url http://localhost:3334/v1 \
  --custom-model-id sml/auto \
  --custom-api-key dummy \
  --custom-compatibility openai \
  --skip-channels --skip-daemon \
  --skip-health --skip-search \
  --skip-skills --skip-ui

ตรวจสอบ openclaw.json

หลัง onboard ไฟล์ ~/.openclaw/openclaw.json จะหน้าตาแบบนี้:

{
  "models": {
    "providers": {
      "custom-host-docker-internal-3334": {
        "baseUrl": "http://host.docker.internal:3334/v1",
        "apiKey": "dummy",
        "api": "openai-completions",
        "models": [{ "id": "sml/auto", "contextWindow": 131072 }]
      }
    }
  }
}

ⓘ

ถ้า contextWindowน้อยกว่า 131072 ให้แก้เป็น 131072 เพราะ OpenClaw ส่ง system prompt ใหญ่มาก • api ต้องเป็นopenai-completions

แก้ปัญหา “origin not allowed”

{
  "apiProvider": "openai-completions",
  "openAiBaseUrl": "http://host.docker.internal:3334/v1",
  "openAiModelId": "sml/auto",
  "openAiApiKey": "dummy",
  "contextWindow": 131072,
  "gateway": {
    "bind": "lan",
    "allowedOrigins": [
      "http://host.docker.internal:3334",
      "http://localhost:3334"
    ]
  }
}

🦎

เชื่อม Hermes Agent

ⓘ

Hermes Agent — self-improving AI agent จาก Nous Research (ปล่อย ก.พ. 2026, ≥95k⭐) มี built-in tools (terminal, file, web search, memory) + เรียนรู้ skill ข้ามเซสชัน เชื่อมกับ SMLGateway ด้วยการตั้ง base_url ใน ~/.hermes/config.toml ตัวเดียว

1. ติดตั้ง Hermes

ต้องมี Python 3.11+ (Windows ใช้ WSL2 เท่านั้น — native ไม่รองรับ)

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# ตรวจว่าติดตั้งสำเร็จ
hermes --version

ทางเลือก manual: git clone repo → python -m venv venv → pip install -r requirements.txt → python setup.py

2. Config ให้ชี้มา SMLGateway

แก้ ~/.hermes/config.toml ตรงๆ (base_url override provider built-in เสมอ):

# ~/.hermes/config.toml
[model]
provider = "custom"
base_url = "http://localhost:3334/v1"
api_key_env = "SML_GATEWAY_KEY"
model = "sml/auto"

# ถ้าอยาก Thai-first ให้ fallback ไป sml/thai เมื่อ primary ตก
[model.fallback]
provider = "custom"
base_url = "http://localhost:3334/v1"
model = "sml/thai"

[agent]
name = "Hermes"
memory = true
skills_dir = "~/.hermes/skills"

3. ใส่ API Key

echo 'SML_GATEWAY_KEY=<sml_live_xxxxxxxxxxxx>' >> ~/.hermes/.env

# Local mode (no auth) — ใช้ dummy ได้
# echo 'SML_GATEWAY_KEY=dummy' >> ~/.hermes/.env

Key สร้างได้ที่ /admin/keys (owner only)

4. ใช้งาน / เปลี่ยน model

hermes "refactor this repo to use async/await"

# สลับ model ระหว่างใช้งาน (Hermes รองรับ live switch)
hermes model   # เลือกจาก list
hermes tools   # เปิด/ปิด built-in tools
hermes setup   # wizard แก้ทุกอย่างพร้อมกัน

⚠

ข้อบังคับ Hermes: model ต้องมี context ≥ 64k tokens — ถ้าต่ำกว่านั้น Hermes จะ reject ที่ startup. sml/autoของ SMLGateway กรอง context < 32k ทิ้งไปแล้ว ดังนั้นส่วนใหญ่ผ่านเกณฑ์ แต่ถ้า route ไปเจอ model 32k อาจขัด — ใช้ header X-SMLGateway-Max-Latency + preset strongest ช่วยคัด

ตัวเลือก: ใช้ Nous Portal เป็น fallback

ถ้าอยาก dual-provider (SMLGateway เป็นหลัก + Nous Portal เป็น backup) ตั้งใน config.toml:

[model.fallback]
provider = "nous-portal"
api_key_env = "NOUS_API_KEY"
model = "Hermes-3-Llama-3.1-405B"

🔗

API Reference

Endpoints

Method	Path	คำอธิบาย
POST	`/v1/chat/completions`	Chat — text / vision / tools / streaming
GET	`/v1/models`	รายชื่อ model ทั้งหมด (OpenAI format)
GET	`/v1/models/:id`	ดึงข้อมูล model — รองรับ ID มี / เช่น sml/tools, groq/vendor/model
GET	`/v1/models/search`	ค้นหา/จัดอันดับ model ตาม category, context, ฯลฯ
POST	`/v1/compare`	ยิง prompt ไปหลาย model พร้อมกัน (≤10)
POST	`/v1/structured`	Chat + JSON schema validation + auto-retry
GET	`/v1/trace/:reqId`	ดู log ของ request เดิม
GET	`/v1/prompts`	รายการ system prompts ที่บันทึกไว้
POST	`/v1/prompts`	สร้าง/เขียนทับ prompt
GET	`/v1/prompts/:name`	ดึง prompt
PUT	`/v1/prompts/:name`	แก้ไข
DELETE	`/v1/prompts/:name`	ลบ
GET	`/api/my-stats`	สรุปการใช้งานของ IP ตัวเอง
POST	`/v1/embeddings`	Embeddings (openrouter / mistral / ollama)
POST	`/v1/completions`	Legacy completions

Response Headers พิเศษ

`X-SMLGateway-Model`	model จริงที่ถูกเลือกใช้
`X-SMLGateway-Provider`	provider ที่เรียกจริง (groq/nvidia/cerebras/...)
`X-SMLGateway-Request-Id`	ใช้กับ /v1/trace/:reqId เพื่อดูรายละเอียด
`X-SMLGateway-Hedge`	true ถ้า response มาจาก hedge winner
`X-SMLGateway-Cache`	HIT ถ้าดึงจาก semantic cache
`X-SMLGateway-Consensus`	รายชื่อ model (เฉพาะ sml/consensus)
`X-Resceo-Backoff`	true ถ้ายิงถี่เกิน soft limit (hint, ไม่บล็อก)

ตัวอย่าง: Vision (ส่งรูป)

curl http://localhost:3334/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sml/auto",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "อธิบายรูปนี้"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }]
  }'

ตัวอย่าง: Tool Calling

curl http://localhost:3334/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sml/tools",
    "messages": [{"role": "user", "content": "กรุงเทพอากาศเป็นยังไง"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "ดูสภาพอากาศเมือง",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }
    }]
  }'

🛠

Dev Tools — สิ่งพิเศษสำหรับนักพัฒนา

SMLGateway มี endpoint ช่วย dev ทำงานได้เร็วขึ้น — ไม่ต้องเขียน retry, ไม่ต้องรู้จัก model ทุกตัว, ไม่ต้องเก็บ prompt ยาวๆ ใน code

1. ค้นหา Model ตาม Capability

หา model ที่เก่งด้านที่ต้องการ — category, context, tools support ฯลฯ

# หา model ภาษาไทยที่รับ context 200K+ ท็อป 3
curl "http://localhost:3334/v1/models/search?category=thai&min_context=200000&top=3"

# หา model tools calling
curl "http://localhost:3334/v1/models/search?category=code&supports_tools=1&top=5"

Query params: category (thai/code/tools/vision/math/ reasoning/json/instruction/extraction/classification/comprehension/safety),min_context, max_context,supports_tools, supports_vision,provider, tier,exclude_cooldown, top

2. เปรียบเทียบ Model

ยิง prompt เดียวไปหลาย model พร้อมกัน → เปรียบเทียบ content + latency

curl -X POST http://localhost:3334/v1/compare \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role":"user","content":"อธิบาย recursion"}],
    "models": [
      "groq/moonshotai/kimi-k2-instruct-0905",
      "cerebras/qwen-3-235b-a22b-instruct-2507",
      "nvidia/meta/llama-4-maverick-17b-128e-instruct"
    ],
    "max_tokens": 200,
    "timeout_ms": 30000
  }'

3. Structured Output (JSON Schema + Auto-retry)

ต้องการ JSON ตาม schema ที่กำหนด — ระบบ validate + retry (default 2 ครั้ง) ให้ ไม่ต้องเขียน parse/retry logic เอง

curl -X POST http://localhost:3334/v1/structured \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sml/auto",
    "messages": [{"role":"user","content":"Describe a fruit"}],
    "schema": {
      "type": "object",
      "required": ["name", "color", "taste"],
      "properties": {
        "name": {"type": "string"},
        "color": {"type": "string"},
        "sweetness": {"type": "integer"}
      }
    },
    "max_retries": 2
  }'

# Response: { ok, attempts, data: { name, color, taste, sweetness }, model, provider, latency_ms, request_ids }

4. Prompt Library

เก็บ system prompt ยาวๆ ไว้เรียกใช้ด้วยชื่อ — ไม่ต้องฝังใน client code

# สร้าง
curl -X POST http://localhost:3334/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{"name":"pirate","content":"You are a pirate. Short answers only.","description":"Pirate persona"}'

# ใช้ในแชท — แค่ใส่ "prompt": "pirate"
curl -X POST http://localhost:3334/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"sml/auto","prompt":"pirate","messages":[{"role":"user","content":"how to fish"}]}'

# รายการทั้งหมด
curl http://localhost:3334/v1/prompts

# แก้ไข / ลบ
curl -X PUT    http://localhost:3334/v1/prompts/pirate -d '{...}'
curl -X DELETE http://localhost:3334/v1/prompts/pirate

5. Trace — Debug Request ย้อนหลัง

ทุก response มี X-SMLGateway-Request-Id → เอาไปเรียก trace endpoint ดูได้ว่าเกิดอะไรกับ request นั้นๆ

# ยิง chat ธรรมดา
curl -D - http://localhost:3334/v1/chat/completions \
  -d '{"model":"sml/auto","messages":[{"role":"user","content":"hi"}]}'
# → response headers มี: X-SMLGateway-Request-Id: 5m3obi

# ดู trace
curl http://localhost:3334/v1/trace/5m3obi
# → { requestId, found, entry: { resolved_model, provider, latency_ms, input_tokens, ... } }

6. Usage Stats ของ IP ตัวเอง

curl "http://localhost:3334/api/my-stats?window=24h"
# → { total, success, p50_latency_ms, p95_latency_ms, p99_latency_ms,
#     top_models: [...], by_hour: [...] }
# window: 1h | 6h | 24h | 7d | 30d

7. Control Headers — บังคับ/หลีกเลี่ยง Provider

`X-SMLGateway-Prefer`	`groq,cerebras`	ดัน provider เหล่านี้ขึ้นบนสุด
`X-SMLGateway-Exclude`	`mistral`	ตัด provider เหล่านี้ออก
`X-SMLGateway-Max-Latency`	`3000`	กรอง model ที่ avg_latency เกินนี้ (ms)
`X-SMLGateway-Strategy`	`fastest`	เรียงตาม latency asc
`X-SMLGateway-Strategy`	`strongest`	เรียงตาม tier + context desc

curl -X POST http://localhost:3334/v1/chat/completions \
  -H "X-SMLGateway-Prefer: groq,cerebras" \
  -H "X-SMLGateway-Exclude: mistral" \
  -H "X-SMLGateway-Strategy: fastest" \
  -H "X-SMLGateway-Max-Latency: 3000" \
  -d '{"model":"sml/auto","messages":[...]}'

🎓

ระบบสอบ (Benchmark)

SMLGateway มีระบบสอบวัดผล model อัตโนมัติ — ใช้ “AI ตรวจ AI” (model หนึ่งเป็นนักเรียน อีกตัวเป็นครู) เพื่อคัดเฉพาะ model ที่ตอบคำถามได้ถูกต้องจริง

โครงสร้างโรงเรียน

👑 Principal (ครูใหญ่) — 1 ตัว, model ที่คะแนนรวมสูงสุด มี tools รองรับ (ใช้ตัดสินข้อพิพาท)
📋 Head (ครูประจำวิชา) — 1 ตัวต่อหมวด, model ที่ทำคะแนน ≥ 80% ในหมวดนั้น (classification, code, comprehension, extraction, instruction, json, math, reasoning, safety, thai, tools, vision)
👥 Proctor (ผู้คุมสอบ) — สูงสุด 10 ตัว, ทำหน้าที่ยิงคำถามวัด latency ไม่มีสิทธิ์ตัดสิน

วิธีคัดเลือก

ข้อสอบ 4 ระดับ (cumulative) — ทุกรอบ worker (15 นาที) จะจัดสอบ model ใหม่ตามระดับที่ตั้งไว้ใน worker_state.exam_level เก็บคะแนนลง model_category_scores แล้วเลือกครูอัตโนมัติ model เดียวสามารถเป็น head หลายหมวดได้ (คนเก่งหลายอย่าง)

🟢 ประถม (primary) — 5 ข้อ, ผ่าน ≥ 40%
🟡 มัธยมต้น (middle) — 14 ข้อ, ผ่าน ≥ 50% (default)
🟠 มัธยมปลาย (high) — 22 ข้อ, ผ่าน ≥ 60%
🔴 มหาลัย (university) — 30 ข้อ, ผ่าน ≥ 70%

เปลี่ยนระดับ + สั่งสอบใหม่

ตั้งค่าระดับใน dashboard section 🏚 ระดับสอบ (คลิกการ์ด → save อัตโนมัติ) หรือ POST /api/exam-config { "level": "middle" }. สั่งสอบใหม่ทุกคน: ปุ่ม “🔄 สอบใหม่ทุกคน” (กด 2 ครั้งยืนยัน) หรือ POST /api/exam-reset — ล้าง exam_attempts +model_category_scores แล้ว trigger worker ทันที

🛠

แก้ปัญหา

เช็คทีละข้อ

Docker Desktop เปิดอยู่ไหม? (ไอคอนวาฬสีเขียว)
Container health ไหม? docker ps --filter name=sml-gateway
เปิด http://localhost:3334/ เห็น dashboard ไหม?
Worker สแกนเสร็จไหม? มี model พร้อมใช้กี่ตัว? (ดูจาก dashboard “คณะครู”)
ทดสอบ: curl http://localhost:3334/v1/models ตอบ list กลับไหม?
ถ้า Docker: base URL เป็น host.docker.internal:3334 ไหม?

404 model not found (sml/tools, groq/vendor/model)

model ID ที่มี / เช่น sml/tools หรือ groq/vendor/modelต้องใช้ได้ตามปกติ — ตรวจสอบได้เลย:

# virtual models (sml/auto, sml/fast, sml/tools, sml/thai, sml/consensus)
curl http://localhost:3334/v1/models/sml/tools
# → { "id": "sml/tools", "object": "model", ... }

# provider/model format
curl http://localhost:3334/v1/models/groq/llama-3.3-70b-versatile
# → { "id": "groq/llama-3.3-70b-versatile", ... }

# ถ้าได้ HTML หรือ 404 — ให้ rebuild container
docker compose up -d --build sml-gateway

Error 413 (payload too large)

เกิดเมื่อ context ที่ส่งใหญ่เกินกว่า model จะรับได้ ระบบจะ cooldown model นั้น 15 นาทีแล้ว fallback ไปตัวที่ใหญ่กว่าอัตโนมัติ (สูงสุด 3 ครั้ง) — ตรวจ contextWindowใน config ต้อง ≥ 131072

Error 429 (rate limit)

Provider นั้นเต็ม quota — ระบบจะ cooldown ตาม streak ที่ fail (10s → 20s → 40s → 1m → 2m cap) แล้วสลับไป provider อื่น ดู quota ที่เหลือได้ที่ dashboard section “โควต้า”

Cooldown cascade (503 เยอะ)

ถ้าเจอ 503 บ่อย แปลว่า candidate pool แคบเกิน — ลองเปิด provider เพิ่มใน setup modal หรือเช็ค health_logs ใน DB:

docker exec sml-gateway-postgres-1 psql -U sml -d smlgateway \
  -c "SELECT COUNT(*) FROM health_logs WHERE cooldown_until > now();"

SMLGateway • AI Gateway • Local Docker only