Configuration reference ======================== All pipeline behaviour is driven by a single YAML file. Copy ``pysyrev/templates/config.yaml`` and fill in only the sections you need — absent sections are skipped entirely. .. code-block:: bash # Load and run programmatically from pysyrev.core.config import Config from pysyrev import Pipeline cfg = Config.load("my_config.yaml") Pipeline.from_config("my_config.yaml").run() .. rubric:: Stage execution order Sections are executed in canonical order, regardless of their order in the YAML file: .. code-block:: text bib → review → bib_network → topic_model → topic_report .. rubric:: Output auto-wiring When ``doc_dataset`` / ``run_dir`` fields are left blank, ``Config.load()`` automatically propagates outputs between stages: - ``bib.export.export_dir`` → ``review.doc_dataset`` - ``review.export.export_dir`` → ``bib_network.doc_dataset`` - ``review.export.export_dir`` → ``topic_model.doc_dataset`` - ``topic_model.export.export_dir`` → ``topic_report.run_dir`` ---- Root level ---------- ``env`` :Type: ``str`` :Default: — :Required: no Path to a ``.env`` file. Any ``${VAR}`` reference found anywhere in the YAML is resolved against this file at load time. Variables already set in the process environment take precedence. .. code-block:: yaml env: /path/to/.env bib: wos: api: api_key: ${WOS_API_KEY} ---- ``bib`` — bibliography collection ----------------------------------- The ``bib`` section collects records from one or more bibliographic sources, cleans and filters them, removes cross-source duplicates, and writes a consolidated CSV. ``wos`` :Type: mapping or ``str`` :Default: — :Required: no Web of Science source. Can be a plain file path (shorthand for ``source: file``) or a structured block with the keys below. .. note:: When ``source: file`` and ``file`` points to a **directory**, all ``.bib`` files in that directory are concatenated automatically. This handles WoS exports split into chunks of 500 or 1 000 records. ``wos.source`` :Type: ``str`` :Default: ``file`` :Values: ``file`` | ``api`` :Required: yes Whether to read from a local export file or from the WoS Expanded API. ``wos.file`` :Type: ``str`` :Default: — :Required: when ``source: file`` Path to a ``.bib`` export file, or to a directory containing multiple ``.bib`` files (chunked WoS export). ``wos.api.api_key`` :Type: ``str`` :Default: — :Required: when ``source: api`` WoS Expanded API key. Use ``${WOS_API_KEY}`` to read from the ``.env`` file. ``wos.api.query`` :Type: ``str`` :Default: — :Required: when ``source: api`` WoS Query Language expression, e.g. ``'ALL=(agent-based model) AND PY=2015-2024'``. ``wos.api.cache_dir`` :Type: ``str`` :Default: ``null`` (no caching) :Required: no Local directory where raw API responses are cached. Subsequent runs with the same query read from disk instead of hitting the API. ``open_alex`` :Type: mapping or ``str`` :Default: — :Required: no OpenAlex source. Can be a plain CSV file path or a structured block. ``open_alex.source`` :Type: ``str`` :Default: ``file`` :Values: ``file`` | ``api`` :Required: yes ``open_alex.file`` :Type: ``str`` :Default: — :Required: when ``source: file`` Path to an OpenAlex CSV export. ``open_alex.api.api_key`` :Type: ``str`` :Default: — :Required: when ``source: api`` ``open_alex.api.email`` :Type: ``str`` :Default: ``null`` :Required: no Providing an e-mail address enables the OpenAlex *polite pool* (higher rate limits). Strongly recommended for non-trivial usage. ``open_alex.api.query`` :Type: ``str`` :Default: ``null`` :Required: no (one of ``query`` or ``filters`` must be set) Free-text BM25 search on title and abstract. ``open_alex.api.filters`` :Type: mapping :Default: ``null`` :Required: no Structured OpenAlex filters, combined with ``AND``. Common keys: .. code-block:: yaml filters: publication_year: '2015-2024' type: article ``open_alex.api.cache_dir`` :Type: ``str`` :Default: ``null`` (no caching) :Required: no ``clean`` :Type: mapping :Default: all defaults applied :Required: no Abstract quality filter applied before document extraction. ``clean.min_signals_to_reject`` :Type: ``int`` :Default: ``2`` Number of garbage signals (boilerplate patterns, encoding artefacts, etc.) that must be detected before an abstract is dropped. Raising this value makes the filter more permissive. ``clean.extra_garbage_phrases`` :Type: list of ``str`` :Default: ``[]`` Additional literal phrases that count as garbage signals. ``clean.use_langdetect`` :Type: ``bool`` :Default: ``false`` When ``true``, records whose abstract language cannot be confirmed by ``langdetect`` are flagged. Disable when processing multilingual corpora or when abstracts are absent. ``extract`` :Type: mapping :Default: all defaults applied :Required: no Document-level filtering applied after cleaning. ``extract.year`` :Type: ``int`` :Default: ``1900`` Minimum publication year (inclusive). Records published before this year are dropped. ``extract.language`` :Type: ``str`` or list of ``str`` :Default: ``null`` (keep all languages) Language or list of languages to keep, e.g. ``english`` or ``[english, french]``. ``extract.nb_citations`` :Type: ``int`` :Default: ``0`` Minimum citation count (inclusive). Records with fewer citations are dropped. ``extract.include_doc_type`` :Type: list of ``str`` :Default: ``null`` (keep all types) Whitelist of document types to retain, e.g. ``[article, review]``. Takes lower priority than ``exclude_doc_type``. ``extract.exclude_doc_type`` :Type: list of ``str`` :Default: ``null`` Document types to remove (fuzzy-matched). Takes priority over ``include_doc_type``. Example: ``[peer review, retraction]``. ``extract.scorer`` :Type: ``str`` :Default: ``partial_token_sort_ratio`` :Values: any ``rapidfuzz`` scorer name Fuzzy scorer used for document-type matching. ``extract.score_cutoff`` :Type: ``int`` :Default: ``90`` :Range: 0–100 Minimum fuzzy score for a document type to match. ``merge`` :Type: mapping :Default: all defaults applied :Required: no Cross-source duplicate removal (based on title similarity). ``merge.title_similarity`` :Type: ``int`` :Default: ``98`` :Range: 0–100 Fuzzy-match threshold for two titles to be considered duplicates. Lower values increase recall but risk false positives. ``merge.ngram_size`` :Type: ``int`` :Default: ``3`` Character n-gram size used to build the candidate index. ``merge.max_candidates_per_row`` :Type: ``int`` :Default: ``200`` Maximum number of candidate duplicates inspected per record. Increase for large corpora if recall is insufficient. ``merge.scorer`` :Type: ``str`` :Default: ``token_set_ratio`` :Values: any ``rapidfuzz`` scorer name ``resolve_references`` :Type: mapping :Default: ``enabled: false`` :Required: no Cross-record reference resolution (opt-in, expensive). Links each cited reference string to a known record in the corpus. ``resolve_references.enabled`` :Type: ``bool`` :Default: ``false`` Set to ``true`` to activate reference resolution. This step is computationally intensive on large corpora. ``resolve_references.flag_unresolved`` :Type: ``bool`` :Default: ``false`` When ``true``, references that cannot be matched are annotated in the output rather than silently dropped. ``resolve_references.fuzzy_score_cutoff`` :Type: ``int`` :Default: ``90`` :Range: 0–100 Minimum fuzzy score for a reference string to be accepted as a match. ``resolve_references.ngram_size`` :Type: ``int`` :Default: ``3`` ``resolve_references.max_candidates`` :Type: ``int`` :Default: ``50`` Maximum candidate records examined per reference string. ``resolve_references.scorer`` :Type: ``str`` :Default: ``token_set_ratio`` :Values: any ``rapidfuzz`` scorer name ``export`` *(bib)* :Type: mapping :Required: yes ``export.export_dir`` :Type: ``str`` :Required: yes Parent directory for bib stage outputs. Each run is stored in a sub-directory ``//bib_dataset.csv``. ``export.run_name`` :Type: ``str`` :Default: ``null`` → auto-generated timestamp ``YYYY-MM-DDTHHMMSS`` A human-readable label for the run, e.g. ``may_2026_wos_oa``. Re-using an existing name reopens that run directory. ---- ``review`` — LLM-based title/abstract screening ------------------------------------------------- Runs a multi-reviewer LLM workflow to decide whether each record should be included in the review. ``review.doc_dataset`` :Type: ``str`` :Default: ``null`` (auto-detect latest bib run) Path to a ``bib_dataset.csv`` produced by the ``bib`` stage. Leave blank to pick up the most recent file in ``bib.export.export_dir`` automatically. ``review.text_inputs`` :Type: list of ``str`` :Default: — :Required: yes :Values: any subset of ``[title, abstract, keywords]`` Fields sent to the LLM for each record. ``review.inclusion_criteria`` :Type: ``str`` (multi-line) :Default: — :Required: yes Free-text description of what **must** be true for a document to be included. Passed verbatim to every reviewer. ``review.exclusion_criteria`` :Type: ``str`` (multi-line) :Default: — :Required: yes Free-text list of reasons to **exclude** a document. Passed verbatim to every reviewer. ``review.decision_rule`` :Type: ``str`` :Default: ``majority`` :Values: ``majority`` | ``mean`` How individual reviewer verdicts are aggregated into a final decision. ``majority`` requires more than half of reviewers to agree; ``mean`` averages their numerical scores. ``review.batch_size`` :Type: ``int`` :Default: ``100`` Number of records processed between checkpoint saves. Smaller values reduce data loss on interruption; larger values reduce overhead. ``review.api_pause`` :Type: ``float`` :Default: ``30.0`` Pause in seconds between batches. Acts as a rate-limit guard for hosted APIs. ``review.sample_size`` :Type: ``int`` :Default: ``null`` (process full dataset) If set, a random sample of this size is drawn from the dataset. Useful for pilot runs. ``review.max_retries`` :Type: ``int`` :Default: ``null`` → module default (2) Section-level default for API call retries on error. Can be overridden per reviewer. ``review.max_concurrent_requests`` :Type: ``int`` :Default: ``null`` → module default (10) Section-level default for concurrent API requests. Keep 5–10 for Anthropic, up to 30 for OpenAI. Can be overridden per reviewer. ``review.items_per_call`` :Type: ``int`` :Default: ``null`` → module default (1) Number of records sent per API call. Batching records reduces cost; the backstory is sent only once per call. Can be overridden per reviewer. ``review.export`` *(review)* :Type: mapping :Required: yes ``export.export_dir`` :Type: ``str`` :Required: yes Parent directory for review outputs. Each run produces ``reviewed_included.csv`` and ``reviewed_total.csv``. ``export.run_name`` :Type: ``str`` :Default: ``null`` → auto-generated timestamp ``export.cache_dir`` :Type: ``str`` :Default: ``null`` → ``/cache/`` Directory for LLM response caching between runs. ``review.workflow`` :Type: list of round mappings :Required: yes Ordered list of screening rounds. Each round specifies a label and the reviewers that participate. Round N+1 only processes records where round N produced no consensus. .. code-block:: yaml workflow: - round: A reviewers: [Reviewer1, Reviewer2] - round: B # optional tie-breaker reviewers: [Reviewer3] ``round`` :Type: ``str`` Arbitrary label for the round (e.g. ``A``, ``B``, ``pilot``). ``reviewers`` :Type: list of ``str`` Names of reviewers participating in this round. Must match names declared in ``review.reviewers``. ``review.reviewers`` :Type: list of reviewer mappings :Required: yes Each entry defines one LLM reviewer. ``name`` :Type: ``str`` :Required: yes Unique identifier for this reviewer. Referenced in ``workflow``. ``provider`` :Type: ``str`` :Required: yes :Values: ``anthropic`` | ``openai`` | ``litellm`` | ``ollama`` LLM provider. Use ``litellm`` or ``ollama`` for custom or self-hosted endpoints. ``model_id`` :Type: ``str`` :Required: yes Model identifier as accepted by the provider, e.g. ``claude-haiku-4-5`` or ``gpt-4o-mini``. ``max_tokens`` :Type: ``int`` :Required: yes Maximum tokens in the model's response. 200 is usually sufficient for a verdict + brief justification. ``temperature`` :Type: ``float`` :Required: yes :Range: 0.0–2.0 Sampling temperature. Lower values produce more deterministic verdicts; ``0.1`` is appropriate for conservative reviewers. ``backstory`` :Type: ``str`` (multi-line) :Required: yes Reviewer persona: domain expertise, role, reviewing style. Injected as the system prompt. ``reasoning`` :Type: ``str`` :Default: ``brief`` :Values: ``brief`` | ``cot`` ``brief`` asks for a short justification; ``cot`` requests a full chain-of-thought before the verdict. ``host`` :Type: ``str`` :Default: ``null`` (use default hosted endpoint) Custom API endpoint. Required for ``litellm`` and ``ollama``; leave blank for Anthropic / OpenAI. ``reasoning_effort`` :Type: ``str`` :Default: ``null`` :Values: ``low`` | ``medium`` | ``high`` Extended-thinking effort level. Only applicable to models that support extended thinking (e.g. ``claude-sonnet-4-5``). ``additional_context`` :Type: ``str`` :Default: ``null`` Extra context appended to each prompt, e.g. the verdicts of previous reviewers for a tie-breaker round. ``max_retries`` :Type: ``int`` :Default: ``null`` → section-level ``review.max_retries`` ``max_concurrent_requests`` :Type: ``int`` :Default: ``null`` → section-level ``review.max_concurrent_requests`` ``items_per_call`` :Type: ``int`` :Default: ``null`` → section-level ``review.items_per_call`` ---- ``bib_network`` — bibliographic networks ------------------------------------------ Builds bibliographic coupling and co-citation graphs from resolved and unresolved reference lists. Outputs two GraphML files per run. ``bib_network.doc_dataset`` :Type: ``str`` :Default: ``null`` (auto-detect latest review run) Path to a ``reviewed_included.csv``. Leave blank to use the most recent file in ``review.export.export_dir``. ``bib_network.coupling_network`` :Type: mapping :Default: all defaults applied Bibliographic coupling graph: two documents are linked if they cite at least one common reference. ``coupling_network.use_resolved`` :Type: ``bool`` :Default: ``false`` Include edges based on resolved (matched) references. ``coupling_network.use_unresolved`` :Type: ``bool`` :Default: ``false`` Include edges based on unresolved (raw string) references. ``coupling_network.min_shared`` :Type: ``int`` :Default: ``1`` Minimum number of shared references required to draw an edge. Increase to reduce noise in dense corpora. ``bib_network.cocitation_network`` :Type: mapping :Default: all defaults applied Co-citation graph: two documents are linked if they are cited together by at least one paper in the corpus. ``cocitation_network.use_resolved`` :Type: ``bool`` :Default: ``false`` ``cocitation_network.use_unresolved`` :Type: ``bool`` :Default: ``false`` ``cocitation_network.min_cocitations`` :Type: ``int`` :Default: ``1`` Minimum co-occurrence count required to draw an edge. ``bib_network.export`` *(bib_network)* :Type: mapping :Required: yes ``export.export_dir`` :Type: ``str`` :Required: yes ``export.run_name`` :Type: ``str`` :Default: ``null`` → auto-generated timestamp ---- ``topic_model`` — BERTopic clustering --------------------------------------- Runs a grid search over HDBSCAN and UMAP hyperparameters using BERTopic, scores each configuration, and writes the top ``keep_n_results`` configurations to ``best_results.csv``. ``topic_model.doc_dataset`` :Type: ``str`` :Default: ``null`` (auto-detect latest review run) ``topic_model.distance`` :Type: ``str`` :Default: ``euclidean`` :Values: ``euclidean`` | ``chebyshev`` Distance metric used to rank hyperparameter configurations in the multi-objective scoring space. ``topic_model.keep_n_results`` :Type: ``int`` :Default: ``10`` Number of best-ranked configurations saved to ``best_results.csv``. ``topic_model.coherence_scorer`` :Type: mapping :Default: all defaults applied ``coherence_scorer.ranking`` :Type: ``str`` :Default: ``u_mass`` :Values: any ``gensim`` coherence measure Fast coherence metric used to rank all configurations in the grid. ``coherence_scorer.purity`` :Type: ``str`` :Default: ``c_v`` :Values: any ``gensim`` coherence measure Slower, higher-quality metric applied only to the top-ranked configurations to compute a purity score. ``topic_model.hdbscan`` :Type: mapping :Default: minimal grid (single point ``[2, 2]``) HDBSCAN grid-search parameters. ``hdbscan.min_topic_size_range`` :Type: list of two ``int`` :Default: ``[2, 2]`` ``[min, max]`` bounds for the ``min_cluster_size`` grid. ``hdbscan.min_sample_range`` :Type: list of two ``int`` :Default: ``[2, 2]`` ``[min, max]`` bounds for the ``min_samples`` grid. ``hdbscan.topic_size_step`` :Type: ``int`` :Default: ``1`` Step size for the ``min_cluster_size`` axis of the grid. ``hdbscan.min_sample_step`` :Type: ``int`` :Default: ``1`` Step size for the ``min_samples`` axis of the grid. ``hdbscan.cluster_selection_method`` :Type: ``str`` :Default: ``leaf`` :Values: ``eom`` | ``leaf`` ``hdbscan.metric`` :Type: ``str`` :Default: ``euclidean`` Distance metric passed to HDBSCAN. ``hdbscan.prediction_data`` :Type: ``bool`` :Default: ``true`` Precompute data structures for soft cluster membership prediction. ``topic_model.umap`` :Type: mapping :Default: minimal grid (single point ``[5]``, ``[5]``) UMAP grid-search parameters. Each field accepts a list of values; all combinations are explored. ``umap.n_neighbors`` :Type: list of ``int`` :Default: ``[5]`` Candidate values for UMAP ``n_neighbors``. ``umap.n_components`` :Type: list of ``int`` :Default: ``[5]`` Candidate values for UMAP ``n_components`` (embedding dimensions passed to HDBSCAN). ``umap.metric`` :Type: ``str`` :Default: ``cosine`` Distance metric used by UMAP. ``umap.min_dist`` :Type: ``float`` :Default: ``0.0`` Controls how tightly UMAP packs points in the embedding. ``0.0`` is recommended for clustering. ``umap.low_memory`` :Type: ``bool`` :Default: ``false`` Enable low-memory mode for very large corpora (slower). ``umap.random_state`` :Type: ``int`` :Default: ``42`` Random seed for reproducibility. ``topic_model.bertopic`` :Type: mapping :Default: all defaults applied ``bertopic.transformer_model`` :Type: ``str`` :Default: ``allenai/specter2_base`` HuggingFace model identifier used to produce document embeddings. ``specter2_base`` is pre-trained on scientific text and is the recommended default for academic literature reviews. ``bertopic.n_gram_range`` :Type: ``str`` :Default: ``bigram`` :Values: ``unigram`` | ``bigram`` N-gram range for the c-TF-IDF vocabulary. ``bertopic.language`` :Type: ``str`` :Default: ``english`` ``bertopic.calculate_probabilities`` :Type: ``bool`` :Default: ``true`` Compute soft topic membership probabilities for each document. Required for topic distribution approximation. ``topic_model.berteley`` :Type: mapping :Default: all defaults applied Pre-processing options for the Berteley text normaliser. ``berteley.allow_abbrev`` :Type: ``bool`` :Default: ``false`` Allow abbreviation expansion during tokenisation. ``topic_model.ctfidf`` :Type: mapping :Default: all defaults applied c-TF-IDF weighting options. ``ctfidf.bm25_weighting`` :Type: ``bool`` :Default: ``true`` Apply BM25-style term weighting to c-TF-IDF. ``ctfidf.reduce_frequent_words`` :Type: ``bool`` :Default: ``true`` Down-weight terms that appear frequently across many topics. ``topic_model.topic_distribution`` :Type: mapping :Default: all defaults applied Parameters for the sliding-window topic distribution approximation. ``topic_distribution.window`` :Type: ``int`` :Default: ``8`` Sliding-window size (in tokens) for distribution approximation. ``topic_distribution.stride`` :Type: ``int`` :Default: ``1`` Window stride. ``topic_distribution.min_similarity`` :Type: ``float`` :Default: ``0.1`` Minimum cosine similarity for a window to contribute to a topic's distribution. ``topic_distribution.batch_size`` :Type: ``int`` :Default: ``1000`` Documents processed per batch during distribution approximation. ``topic_model.export`` *(topic_model)* :Type: mapping :Required: yes ``export.export_dir`` :Type: ``str`` :Required: yes ``export.run_name`` :Type: ``str`` :Default: ``null`` → auto-generated timestamp ---- ``topic_report`` — PDF report generation ------------------------------------------ Selects one model configuration from the topic-model results and generates a PDF bibliographic report. Requires the ``report`` section for layout options and, optionally, the ``llm`` section for topic label generation. ``topic_report.model_index`` :Type: ``int`` :Default: ``0`` Row index in ``best_results.csv`` (0-based). ``0`` selects the highest-ranked model configuration. ``topic_report.export_to`` :Type: ``str`` :Required: yes Directory where the generated PDF is written. ``topic_report.run_dir`` :Type: ``str`` :Default: ``null`` (auto-detect latest topic_model run) Path to a specific topic-model run directory. Leave blank to use the most recent run in ``topic_model.export.export_dir``. ---- ``llm`` — topic label generation ---------------------------------- When present, an LLM generates human-readable labels for each topic discovered by the topic-model stage. Used together with ``topic_report``. ``llm.provider`` :Type: ``str`` :Required: yes :Values: ``anthropic`` | ``openai`` | ``litellm`` | ``ollama`` ``llm.model_id`` :Type: ``str`` :Required: yes Model identifier, e.g. ``claude-haiku-4-5-20251001``. ``llm.host`` :Type: ``str`` :Default: ``null`` (use default hosted endpoint) Custom endpoint for ``litellm`` or ``ollama``. ``llm.max_tokens`` :Type: ``int`` :Default: ``200`` ``llm.temperature`` :Type: ``float`` :Default: ``0.3`` ``llm.max_retries`` :Type: ``int`` :Default: ``2`` ``llm.max_concurrent_requests`` :Type: ``int`` :Default: ``5`` ``llm.n_repr_docs_for_labeling`` :Type: ``int`` :Default: ``3`` Number of representative documents (closest to the topic centroid) sent to the LLM to generate each topic label. ``llm.system_prompt`` :Type: ``str`` :Default: ``null`` (built-in default prompt) Override the default system prompt for topic labelling. ---- ``report`` — PDF layout ------------------------- PDF layout and section parameters. All keys are optional; built-in defaults are used for any omitted key. ``report.meta`` :Type: mapping :Default: all defaults applied ``meta.title`` :Type: ``str`` :Default: ``Bibliographic report — Pysyrev`` ``meta.subtitle`` :Type: ``str`` :Default: ``null`` ``meta.author`` :Type: ``str`` :Default: ``Report generated with the pysyrev engine (v0.1)`` ``meta.date_format`` :Type: ``str`` :Default: ``%d/%m/%Y`` ``strftime``-compatible format string for the report date. ``meta.version`` :Type: ``str`` :Default: ``1.0.0`` ``meta.summary`` :Type: ``str`` :Default: ``null`` Optional introductory paragraph shown on the cover page. ``report.sections`` :Type: mapping :Default: all defaults applied ``sections.topics`` ``topics.n_repr_docs_per_topic`` :Type: ``int`` :Default: ``5`` Number of representative documents (closest to the topic centroid) displayed in the per-topic section. ``sections.bib_network`` ``bib_network.enabled`` :Type: ``str`` :Default: ``auto`` :Values: ``auto`` | ``true`` | ``false`` Whether to include the bibliographic network graphs in the report. ``auto`` includes them when the ``bib_network`` stage was run and its outputs are detected. ``sections.temporal`` ``temporal.variants`` :Type: list of ``str`` :Default: ``[absolute, cumulative, normalized, weighted]`` :Values: any subset of ``absolute``, ``cumulative``, ``normalized``, ``weighted`` Publication-trend chart variants included in the temporal analysis section. ``sections.topic_characteristics`` ``topic_characteristics.n_top_cited_per_topic`` :Type: ``int`` :Default: ``5`` Number of most-cited papers per topic used to compute citation impact scores. ``topic_characteristics.n_top_cited_global`` :Type: ``int`` :Default: ``50`` Number of most-cited papers globally used to analyse topic distribution among highly cited documents. ``sections.topic_similarity`` ``topic_similarity.clustering`` :Type: ``bool`` :Default: ``true`` Reorder the similarity heatmap rows/columns by hierarchical clustering. ``topic_similarity.dendrogram`` :Type: ``bool`` :Default: ``true`` Display a dendrogram alongside the heatmap. ``sections.paper_selection`` ``paper_selection.min_year`` :Type: ``int`` :Default: ``2000`` Only papers published from this year onward are eligible for the curated paper-selection section. ``paper_selection.proportion_per_topic`` :Type: ``float`` :Default: ``0.15`` Fraction of each topic's documents included in the curated selection. ``paper_selection.selection_by`` :Type: ``str`` :Default: ``citations`` :Values: ``citations`` | ``random`` Criterion for selecting papers within each topic. ``paper_selection.export_annex`` :Type: ``bool`` :Default: ``true`` Append a full reference list of selected papers as an annex. ``paper_selection.annex_format`` :Type: ``str`` :Default: ``csv`` :Values: ``csv`` | ``txt`` File format for the exported annex.