Expand description
Concurrent PDF resolver for academic papers — 9 open-access sources.
§What it does
Given a DOI, URL, or title, queries up to 9 academic sources in parallel
and returns the best downloadable PDF URL. No Zotero, no reference manager
dependency — just (doi, url, title) → Option<ResolvedPdf>.
§Sources (by priority)
| # | Source | Coverage | Method |
|---|---|---|---|
| 1 | arXiv | arXiv papers | Instant DOI/URL pattern match (no network) |
| 2 | OpenAlex | 250M+ works | Structured OA location data |
| 3 | CORE | 300M+ OA works | Institutional repositories |
| 4 | Google Scholar | Widest coverage | HTML scraping (rate-limit risk) |
| 5 | Unpaywall | 30M+ OA articles | DOI lookup (requires email) |
| 6 | Crossref | Publisher links | DOI metadata |
| 7 | Zenodo | Cross-disciplinary | CERN preprint repository |
| 8 | SSRN | Finance/economics | Preprint abstracts |
| 9 | Semantic Scholar | CS/bio/med | OA PDFs + disclaimer parsing |
§Quick start
// Resolve a PDF by DOI (sync — creates its own tokio runtime):
let result = paper_resolver::resolve_pdf(
Some("10.48550/arXiv.1706.03762"), // DOI
None, // URL
Some("Attention Is All You Need"), // title (fallback)
);
if let Some(pdf) = result {
println!("Found: {} (via {})", pdf.url, pdf.source);
println!("Downloadable: {}", pdf.downloadable);
}§Detailed reporting
Use resolve_pdf_with_report to see why each source succeeded or failed:
let report = paper_resolver::resolve_pdf_with_report(
Some("10.1109/TSE.2010.62"), None,
Some("mutation testing"),
&paper_resolver::ResolverConfig::default(),
);
println!("{}", report.summary());
// PDF found via google_scholar (downloadable: true)
// https://mutationtesting.uni.lu/TR-09-06.pdf
//
// Sources queried:
// openalex: no OA location found
// google_scholar: found https://...
// unpaywall: skipped — configure resolver.email
// ...§Custom configuration
ⓘ
use paper_resolver::{ResolverConfig, SourceEntry};
let mut config = ResolverConfig::default();
config.email = "researcher@university.edu".into();
config.timeout_secs = 10;
config.sources = vec![
SourceEntry::new("arxiv", true),
SourceEntry::new("openalex", true),
SourceEntry::new("unpaywall", true),
// disable the rest
];
let result = paper_resolver::resolve_pdf_with_config(
Some("10.1234/example"), None, None, &config,
);§Design decisions
- Concurrent by default: all enabled sources are queried simultaneously
via
futures::future::join_all. First-to-return wins by priority. - Blocked domains: publisher paywalls (IEEE, Springer, Elsevier, etc.)
are detected and marked
downloadable: falserather than silently failing. - No file I/O: this crate has zero filesystem dependency. Configuration is passed as a struct — the caller owns config file parsing.
- Standalone: no Zotero dependency. Usable in any Rust project that needs academic PDF resolution.
Structs§
- Endpoints
- Base URLs for each source — overridable for testing.
- Resolve
Report - Detailed resolution result — includes per-source failure reasons.
- Resolved
Pdf - Result of PDF URL resolution.
- Resolver
Config - Configuration for the paper resolver.
- Source
Entry - A source entry — name + enabled flag.
Constants§
- SOURCE_
NAMES - All available source names.
Functions§
- resolve_
pdf - Resolve a PDF URL using all available sources (default config).
- resolve_
pdf_ async - Async version with configuration — caller owns the tokio runtime.
- resolve_
pdf_ async_ with_ report - Async version with detailed per-source reporting.
- resolve_
pdf_ with_ config - Resolve a PDF URL with custom configuration.
- resolve_
pdf_ with_ report - Resolve with detailed per-source reporting.