Configuration reference ======================== All pipeline behaviour is driven by a single YAML file. Copy ``pysyrev/templates/config.yaml`` and fill in only the sections you need — absent sections are skipped entirely. .. code-block:: bash # Load programmatically from pysyrev.core.config import Config cfg = Config.load("my_config.yaml") Environment variables --------------------- Any ``${VAR}`` reference in the YAML is resolved against the ``.env`` file pointed to by the root-level ``env:`` key: .. code-block:: yaml env: /path/to/.env bib: wos: api: api_key: ${WOS_API_KEY} ``bib`` — bibliography collection ----------------------------------- .. code-block:: yaml bib: wos: source: file # 'file' | 'api' file: /path/to/savedrecs.bib # file path or directory of .bib files # api: # api_key: ${WOS_API_KEY} # query: 'ALL=(your query) AND PY=2015-2024' # cache_dir: /path/to/cache/ open_alex: source: api api: api_key: ${OPENALEX_API_KEY} email: ${OPENALEX_EMAIL} query: your search terms filters: publication_year: '2015-2024' type: article cache_dir: /path/to/cache/ # scopus: /path/to/scopus.csv # pubmed: /path/to/pubmed.nbib clean: min_signals_to_reject: 2 # garbage signals needed to drop an abstract use_langdetect: false extract: year: 2000 # min publication year (inclusive) language: english # str, list of str, or blank (keep all) # include_doc_type: [article, review] # exclude_doc_type: [peer review] merge: title_similarity: 98 # fuzzy threshold (0–100) resolve_references: enabled: false # opt-in — expensive flag_unresolved: false export: export_dir: /path/to/bib_results/ run_name: # blank → auto-timestamp .. note:: When ``wos.source: file`` and ``file`` points to a **directory**, all ``.bib`` files in that directory are read and concatenated automatically. This handles WoS exports split into chunks of 500 or 1 000 records. ``review`` — LLM-based screening ---------------------------------- .. code-block:: yaml review: doc_dataset: # blank → auto-detect latest bib run export: export_dir: /path/to/review_results/ text_inputs: [title, abstract, keywords] inclusion_criteria: | exclusion_criteria: | 1. decision_rule: majority # majority | mean max_concurrent_requests: 5 items_per_call: 5 workflow: - round: A reviewers: [Reviewer1, Reviewer2] reviewers: - name: Reviewer1 provider: anthropic model_id: claude-haiku-4-5 max_tokens: 200 temperature: 0.7 reasoning: brief # brief | cot backstory: | ``bib_network`` — bibliographic networks ----------------------------------------- .. code-block:: yaml bib_network: doc_dataset: # blank → auto-detect latest review run coupling_network: use_resolved: true use_unresolved: true min_shared: 1 cocitation_network: use_resolved: true use_unresolved: true min_cocitations: 1 export: export_dir: /path/to/bib_network_results/ ``topic_model`` — BERTopic clustering --------------------------------------- .. code-block:: yaml topic_model: doc_dataset: # blank → auto-detect latest review run export: export_dir: /path/to/topic_modeling/ distance: euclidean # euclidean | chebyshev keep_n_results: 10 hdbscan: min_topic_size_range: [10, 50] min_sample_range: [2, 10] topic_size_step: 4 min_sample_step: 2 umap: n_neighbors: [5, 10] n_components: [5, 10, 15] ``topic_report`` — PDF report generation ----------------------------------------- .. code-block:: yaml topic_report: model_index: 0 # index in best_results.csv (0 = best) export_to: /path/to/report/ # run_dir: # blank → auto-detect latest topic_model run ``llm`` — topic label generation ---------------------------------- When present, an LLM generates human-readable labels for each topic: .. code-block:: yaml llm: provider: anthropic model_id: claude-haiku-4-5-20251001 max_tokens: 200 temperature: 0.3 n_repr_docs_for_labeling: 3 ``report`` — PDF layout ------------------------ .. code-block:: yaml report: meta: title: My Literature Review author: My Name date_format: "%d/%m/%Y" sections: topics: n_repr_docs_per_topic: 5 temporal: variants: [absolute, cumulative, normalized, weighted] topic_characteristics: n_top_cited_per_topic: 5 n_top_cited_global: 50 paper_selection: min_year: 2020 proportion_per_topic: 0.15 selection_by: citations # citations | random export_annex: true annex_format: csv # csv | txt