Quick start

CLI

# Run all stages declared in the config
python -m pysyrev config.yaml

# Or via the installed entry point
pysyrev config.yaml

# Run a single stage
python -m pysyrev config.yaml --stage bib
python -m pysyrev config.yaml --stage review
python -m pysyrev config.yaml --stage bib-network
python -m pysyrev config.yaml --stage topic-model
python -m pysyrev config.yaml --stage topic-report

Python API

from pysyrev import Pipeline

# Full pipeline in one call
pipeline = Pipeline.from_config("config.yaml")
pipeline.run()

# Or stage by stage — results persist between calls
pipeline.run(stages=["bib"])
pipeline.run(stages=["review"])       # uses pipeline.bib.dataset automatically
pipeline.run(stages=["topic-report"]) # generates the PDF report

# Access results
df_all  = pipeline.bib.dataset           # pd.DataFrame — all collected documents
df_kept = pipeline.review.included_docs  # pd.DataFrame — LLM-screened inclusions
network = pipeline.network               # BibNetwork
topic   = pipeline.topic                 # TopicModel
report  = pipeline.report                # TopicReport

Report-only run

A config containing only topic_report (and optionally report and llm) is valid. This lets you regenerate a report from a previous topic-model run without re-running the full pipeline:

report_only.yaml
topic_report:
  run_dir: /path/to/topic_modeling/run_2026-05-01T120000/
  model_index: 0
  export_to: /path/to/output/report/
python -m pysyrev report_only.yaml

Auto-detection between stages

When doc_dataset or run_dir fields are left blank, pysyrev auto-detects the most recent output of the previous stage:

Blank field

Auto-detected from

review.doc_dataset

latest run in bib.export.export_dir

bib_network.doc_dataset

latest run in review.export.export_dir

topic_model.doc_dataset

latest run in review.export.export_dir

topic_report.run_dir

latest run in topic_model.export.export_dir