Crate paper_resolver

Source
Expand description

Concurrent PDF resolver for academic papers — 9 open-access sources.

§What it does

Given a DOI, URL, or title, queries up to 9 academic sources in parallel and returns the best downloadable PDF URL. No Zotero, no reference manager dependency — just (doi, url, title) → Option<ResolvedPdf>.

§Sources (by priority)

#SourceCoverageMethod
1arXivarXiv papersInstant DOI/URL pattern match (no network)
2OpenAlex250M+ worksStructured OA location data
3CORE300M+ OA worksInstitutional repositories
4Google ScholarWidest coverageHTML scraping (rate-limit risk)
5Unpaywall30M+ OA articlesDOI lookup (requires email)
6CrossrefPublisher linksDOI metadata
7ZenodoCross-disciplinaryCERN preprint repository
8SSRNFinance/economicsPreprint abstracts
9Semantic ScholarCS/bio/medOA PDFs + disclaimer parsing

§Quick start

// Resolve a PDF by DOI (sync — creates its own tokio runtime):
let result = paper_resolver::resolve_pdf(
    Some("10.48550/arXiv.1706.03762"), // DOI
    None,                              // URL
    Some("Attention Is All You Need"), // title (fallback)
);

if let Some(pdf) = result {
    println!("Found: {} (via {})", pdf.url, pdf.source);
    println!("Downloadable: {}", pdf.downloadable);
}

§Detailed reporting

Use resolve_pdf_with_report to see why each source succeeded or failed:

let report = paper_resolver::resolve_pdf_with_report(
    Some("10.1109/TSE.2010.62"), None,
    Some("mutation testing"),
    &paper_resolver::ResolverConfig::default(),
);
println!("{}", report.summary());
// PDF found via google_scholar (downloadable: true)
//   https://mutationtesting.uni.lu/TR-09-06.pdf
//
// Sources queried:
//   openalex: no OA location found
//   google_scholar: found https://...
//   unpaywall: skipped — configure resolver.email
//   ...

§Custom configuration

use paper_resolver::{ResolverConfig, SourceEntry};

let mut config = ResolverConfig::default();
config.email = "researcher@university.edu".into();
config.timeout_secs = 10;
config.sources = vec![
    SourceEntry::new("arxiv", true),
    SourceEntry::new("openalex", true),
    SourceEntry::new("unpaywall", true),
    // disable the rest
];

let result = paper_resolver::resolve_pdf_with_config(
    Some("10.1234/example"), None, None, &config,
);

§Design decisions

  • Concurrent by default: all enabled sources are queried simultaneously via futures::future::join_all. First-to-return wins by priority.
  • Blocked domains: publisher paywalls (IEEE, Springer, Elsevier, etc.) are detected and marked downloadable: false rather than silently failing.
  • No file I/O: this crate has zero filesystem dependency. Configuration is passed as a struct — the caller owns config file parsing.
  • Standalone: no Zotero dependency. Usable in any Rust project that needs academic PDF resolution.

Structs§

Endpoints
Base URLs for each source — overridable for testing.
ResolveReport
Detailed resolution result — includes per-source failure reasons.
ResolvedPdf
Result of PDF URL resolution.
ResolverConfig
Configuration for the paper resolver.
SourceEntry
A source entry — name + enabled flag.

Constants§

SOURCE_NAMES
All available source names.

Functions§

resolve_pdf
Resolve a PDF URL using all available sources (default config).
resolve_pdf_async
Async version with configuration — caller owns the tokio runtime.
resolve_pdf_async_with_report
Async version with detailed per-source reporting.
resolve_pdf_with_config
Resolve a PDF URL with custom configuration.
resolve_pdf_with_report
Resolve with detailed per-source reporting.