polysearch/README.md

116 lines
4.2 KiB
Markdown
Raw Normal View History

# PolySearch
Multi-engine web + image search with smart proxy distribution, circuit breakers, structured AI agent output, and a REST API.
```bash
node src/index.js -q "quantum computing" -t both -l 10 -m agent
```
## Features
- **Web + Image search** — both result types, one tool
- **Smart proxy distribution** — least-used selection per hour, balanced across providers
- **Circuit breaker per proxy** — exponential backoff on failure, auto-recovery
- **Multi-provider proxy system** — add Webshare, Oxylabs, BrightData in one file each
- **Multi-engine architecture** — add Brave, Bing, Google in one file each
- **Per-provider metrics** — requests, success rate, latency, hourly usage grouped by provider
- **Dual output modes**:
- `human` — colorized terminal
- `agent` — structured JSON with statistics
- **REST API** — single search, batch search, auth with API keys. See [API.md](API.md)
## Requirements
Node.js 18+
## Quick start
```bash
# Single image search
node src/index.js -q "vintage radio"
# Web search
node src/index.js -q "quantum computing" -t web
# Both types, AI agent JSON
node src/index.js -q "spacex starship" -t both -l 10 -m agent
# Show proxy metrics after a search
node src/index.js -q "mars rover" -M
```
## CLI
| Flag | Long | Description | Default |
|------|------|-------------|---------|
| `-q` | `--query` | Search query | — |
| `-t` | `--type` | `web`, `image`, or `both` | `image` |
| `-l` | `--limit` | Max results per type | `10` |
| `-m` | `--mode` | `human` or `agent` | `human` |
| `-p` | `--proxy` | Single proxy URL override | — |
| `-c` | `--config` | Path to config file | auto-detect |
| `-M` | `--metrics` | Dump proxy pool metrics | — |
| | `--serve` | Start REST API server | — |
| | `--port` | API server port | `9876` |
| | `--generate-key` | Generate API key | — |
| `-h` | `--help` | Show help | — |
## REST API
For AI agent consumption. See [API.md](API.md) for full documentation.
```bash
node src/index.js --generate-key # create an API key
node src/index.js --serve --port 9876 # start the server
```
**Endpoints:** `GET /health`, `POST /search`, `POST /batch`, `GET /metrics`
---
## CLI
Providers are auto-discovered from environment variables:
| Provider | Env vars | Type |
|----------|----------|------|
| Webshare | `WEBSHARE_API_KEY` | API-fetched, 10 rotating IPs |
| Oxylabs | `OXYLABS_USERNAME`, `OXYLABS_PASSWORD`, `OXYLABS_COUNTRY` | Single datacenter endpoint |
Add a new provider by creating a file in `src/http/providers/` that calls `registerProvider(name, fetcher)`. The fetcher returns an array of proxy URL strings.
## Engine architecture
Engines are registered in `src/engines/setup.js`. Each engine supports `web`, `image`, or both. DuckDuckGo is the default. Add Brave, Bing, or custom engines by implementing the `search(query, opts)` interface.
## Project structure
```
src/
├── index.js # CLI + programmatic API + API server dispatch
├── api.js # REST API server (/search, /batch, /metrics, /health)
├── api-key.js # Key generation + env storage
├── cli.js # Argument parsing
├── config.js # Config loader (json + env + providers)
├── run.js # Search orchestration + engine fallback
├── engines/
│ ├── base.js # Abstract engine interface
│ ├── index.js # Engine registry
│ ├── setup.js # Built-in engine registration
│ └── duckduckgo.js # DuckDuckGo (web + image)
├── http/
│ ├── client.js # Fetch wrapper (proxy, retry, timeout, UA)
│ ├── proxy.js # Proxy pool (least-used, circuit breaker, metrics)
│ └── providers/
│ ├── index.js # Provider registry
│ ├── webshare.js # Webshare.io
│ └── oxylabs.js # Oxylabs datacenter
├── output/
│ ├── human.js # Terminal formatting
│ └── agent.js # JSON formatting
└── utils/
├── logger.js # Pino structured logging
├── retry.js # Exponential backoff + jitter
└── ua.js # User-agent pool
```