Configuration classes

These dataclasses mirror the YAML structure. They are produced by Config.load() and consumed by the runtime pipeline classes.

Root

class pysyrev.core.config.Config(env=None, bib=None, review=None, bib_network=None, topic_model=None, topic_report=None, report=None, llm=None, bib_network_graphs=None)[source]

Bases: object

Root configuration object.

All sections are optional — only the sections present in the YAML are executed. The canonical stage order is: bib review bib-network topic-model topic-report.

Config.load() propagates outputs between stages automatically when doc_dataset / run_dir are left blank, so a full-pipeline YAML requires no explicit cross-section paths.

Parameters:
env: None | str = None
bib: None | BibConfig = None
review: None | ReviewConfig = None
bib_network: None | BibNetworkConfig = None
topic_model: None | TopicModelConfig = None
topic_report: None | TopicReportConfig = None
report: None | ReportConfig = None
llm: None | TopicLabelerConfig = None
bib_network_graphs: None | BibNetworkReportConfig = None
classmethod load(config_file)[source]

Load a YAML config file.

Steps:
  1. Read the YAML.

  2. Load the .env file referenced by the root-level env: key (if any).

  3. Resolve all ${VAR} references.

  4. Propagate outputs between stages when doc_dataset / run_dir are left blank (auto-detection of the latest run in each export_dir).

  5. Build typed dataclasses for every section present.

bib section

class pysyrev.core.config.BibConfig(wos=None, open_alex=None, scopus=None, pubmed=None, export=None, clean=None, extract=None, resolve_references=None, merge=None)[source]

Bases: ConfigField

Parameters:
wos: None | str | WosSourceConfig = None
open_alex: None | str | OpenAlexSourceConfig = None
scopus: None | str = None
pubmed: None | str = None
export: None | BibExportConfig = None
clean: CleanConfig = None
extract: ExtractConfig = None
resolve_references: ResolveReferencesConfig = None
merge: MergeConfig = None
class pysyrev.core.config.WosSourceConfig(source='file', file=None, api=None)[source]

Bases: ConfigField

One WoS source: either a file path, or an API config. Exactly one of file / api must be set.

Parameters:
source: str = 'file'
file: None | str = None
api: None | WosApiConfig = None
class pysyrev.core.config.WosApiConfig(api_key, query, cache_dir=None)[source]

Bases: ConfigField

Configuration for retrieving WoS records via the Expanded API.

Parameters:
  • api_key (str)

  • query (str)

  • cache_dir (None | str)

api_key: str
query: str
cache_dir: None | str = None
class pysyrev.core.config.OpenAlexSourceConfig(source='file', file=None, api=None)[source]

Bases: ConfigField

One OpenAlex source: either a file path, or an API config. Exactly one of file / api must be set.

Parameters:
source: str = 'file'
file: None | str = None
api: None | OpenAlexApiConfig = None
class pysyrev.core.config.OpenAlexApiConfig(api_key, email=None, query=None, filters=None, cache_dir=None)[source]

Bases: ConfigField

Configuration for retrieving works via the OpenAlex API.

Either query (free-text BM25) or filters (structured) must be set (both can also be combined). email enables the polite pool — it is optional but strongly recommended for non-trivial usage.

Parameters:
  • api_key (str)

  • email (None | str)

  • query (None | str)

  • filters (None | dict)

  • cache_dir (None | str)

api_key: str
email: None | str = None
query: None | str = None
filters: None | dict = None
cache_dir: None | str = None
class pysyrev.core.config.CleanConfig(min_signals_to_reject=2, extra_garbage_phrases=None, use_langdetect=False)[source]

Bases: ConfigField

Parameters:
  • min_signals_to_reject (int)

  • extra_garbage_phrases (None | List[str])

  • use_langdetect (bool)

min_signals_to_reject: int = 2
extra_garbage_phrases: None | List[str] = None
use_langdetect: bool = False
class pysyrev.core.config.ExtractConfig(include_doc_type=None, exclude_doc_type=None, year=1900, nb_citations=0, language=None, scorer='partial_token_sort_ratio', score_cutoff=90)[source]

Bases: ConfigField

Parameters:
include_doc_type: None | List[str] = None
exclude_doc_type: None | List[str] = None
year: int = 1900
nb_citations: int = 0
language: None | str | List[str] = None
scorer: str = 'partial_token_sort_ratio'
score_cutoff: int = 90
class pysyrev.core.config.MergeConfig(title_similarity=98, ngram_size=3, max_candidates_per_row=200, scorer='token_set_ratio')[source]

Bases: ConfigField

Parameters:
  • title_similarity (int)

  • ngram_size (int)

  • max_candidates_per_row (int)

  • scorer (str)

title_similarity: int = 98
ngram_size: int = 3
max_candidates_per_row: int = 200
scorer: str = 'token_set_ratio'
class pysyrev.core.config.ResolveReferencesConfig(enabled=False, flag_unresolved=False, fuzzy_score_cutoff=90, ngram_size=3, max_candidates=50, scorer='token_set_ratio')[source]

Bases: ConfigField

Parameters:
  • enabled (bool)

  • flag_unresolved (bool)

  • fuzzy_score_cutoff (int)

  • ngram_size (int)

  • max_candidates (int)

  • scorer (str)

enabled: bool = False
flag_unresolved: bool = False
fuzzy_score_cutoff: int = 90
ngram_size: int = 3
max_candidates: int = 50
scorer: str = 'token_set_ratio'
class pysyrev.core.config.BibExportConfig(export_dir, run_name=None, dataset=None)[source]

Bases: ConfigField

Output configuration for the bib stage.

Each run is stored in <export_dir>/<run_name>/bib_dataset.csv. Call resolve() (done automatically by BibDataset.from_config) to finalise the run directory and set dataset on the instance. Leave run_name blank to auto-generate a timestamp.

Parameters:
  • export_dir (str)

  • run_name (None | str)

  • dataset (str)

export_dir: str
run_name: None | str = None
dataset: str = None
resolve()[source]

Create the run directory and set the output CSV path.

review section

class pysyrev.core.config.ReviewConfig(export, text_inputs, inclusion_criteria, exclusion_criteria, reviewers, workflow, doc_dataset=None, batch_size=100, api_pause=30.0, decision_rule='majority', sample_size=None, max_retries=None, max_concurrent_requests=None, items_per_call=None)[source]

Bases: ConfigField

Parameters:
export: ReviewExportConfig
text_inputs: List[str]
inclusion_criteria: str
exclusion_criteria: str
reviewers: List[ReviewerConfig]
workflow: List[dict]
doc_dataset: None | str = None
batch_size: int = 100
api_pause: float = 30.0
decision_rule: str = 'majority'
sample_size: None | int = None
max_retries: None | int = None
max_concurrent_requests: None | int = None
items_per_call: None | int = None
class pysyrev.core.config.ReviewerConfig(model_id, host, provider, name, max_tokens, temperature, reasoning_effort, backstory, additional_context, reasoning='brief', max_retries=None, max_concurrent_requests=None, items_per_call=None)[source]

Bases: ConfigField

Mirror of one entry under review.reviewers in the YAML. Cross-section fields like inclusion_criteria are NOT here — they live at the ReviewConfig level and are wired together by the runtime layer.

Parameters:
  • model_id (str)

  • host (str)

  • provider (str)

  • name (str)

  • max_tokens (int)

  • temperature (float)

  • reasoning_effort (str)

  • backstory (str)

  • additional_context (str)

  • reasoning (str)

  • max_retries (None | int)

  • max_concurrent_requests (None | int)

  • items_per_call (None | int)

model_id: str
host: str
provider: str
name: str
max_tokens: int
temperature: float
reasoning_effort: str
backstory: str
additional_context: str
reasoning: str = 'brief'
max_retries: None | int = None
max_concurrent_requests: None | int = None
items_per_call: None | int = None
class pysyrev.core.config.ReviewExportConfig(export_dir, run_name=None, cache_dir=None, included_docs=None, total_docs=None)[source]

Bases: ConfigField

Output configuration for the review stage.

Declare the parent directory (export_dir) and an optional run label (run_name). If run_name is left blank, resolve() generates a timestamp name (YYYY-MM-DDTHHMMSS) at run time so that successive test runs never overwrite each other.

resolve() must be called before the review runs (done automatically by LLMReview.from_config). It creates the run directory, sets included_docs / total_docs on the instance, and defaults cache_dir to <run_dir>/cache/ when not explicitly provided.

Downstream sections (bib_network, topic_model) can reference the output via config.review.export.included_docs after resolve(), or leave doc_dataset blank to have Config.load auto-detect the most recent run.

Parameters:
  • export_dir (str)

  • run_name (str)

  • cache_dir (str)

  • included_docs (str)

  • total_docs (str)

export_dir: str
run_name: str = None
cache_dir: str = None
included_docs: str = None
total_docs: str = None
resolve()[source]

Finalise run_name, create output directories, and set file paths.

bib_network section

class pysyrev.core.config.BibNetworkConfig(doc_dataset=None, coupling_network=None, cocitation_network=None, export=None)[source]

Bases: ConfigField

Parameters:
doc_dataset: str = None
coupling_network: CouplingNetworkConfig = None
cocitation_network: CocitationNetworkConfig = None
export: None | BibNetworkExportConfig = None
class pysyrev.core.config.CouplingNetworkConfig(use_resolved=False, use_unresolved=False, min_shared=1)[source]

Bases: ConfigField

Parameters:
  • use_resolved (bool)

  • use_unresolved (bool)

  • min_shared (int)

use_resolved: bool = False
use_unresolved: bool = False
min_shared: int = 1
class pysyrev.core.config.CocitationNetworkConfig(use_resolved=False, use_unresolved=False, min_cocitations=1)[source]

Bases: ConfigField

Parameters:
  • use_resolved (bool)

  • use_unresolved (bool)

  • min_cocitations (int)

use_resolved: bool = False
use_unresolved: bool = False
min_cocitations: int = 1
class pysyrev.core.config.BibNetworkExportConfig(export_dir, run_name=None, coupling_graph=None, cocitation_graph=None)[source]

Bases: ConfigField

Output configuration for the bib_network stage.

Each run is stored in <export_dir>/<run_name>/. Leave run_name blank to auto-generate a timestamp. Call resolve() to finalise the run directory and set file paths.

Parameters:
  • export_dir (str)

  • run_name (None | str)

  • coupling_graph (None | str)

  • cocitation_graph (None | str)

export_dir: str
run_name: None | str = None
coupling_graph: None | str = None
cocitation_graph: None | str = None
resolve()[source]

Create the run directory and set output file paths.

topic_model section

class pysyrev.core.config.TopicModelConfig(export, doc_dataset=None, distance='euclidean', keep_n_results=10, coherence_scorer=None, hdbscan=None, umap=None, bertopic=None, berteley=None, ctfidf=None, topic_distribution=None)[source]

Bases: ConfigField

Parameters:
export: TopicExportConfig
doc_dataset: None | str = None
distance: str = 'euclidean'
keep_n_results: int = 10
coherence_scorer: CoherenceScorerConfig = None
hdbscan: HDBSCANConfig = None
umap: UMAPConfig = None
bertopic: BertopicConfig = None
berteley: BerteleyConfig = None
ctfidf: CTFIDFConfig = None
topic_distribution: TopicDistributionConfig = None
class pysyrev.core.config.HDBSCANConfig(min_topic_size_range=<factory>, min_sample_range=<factory>, topic_size_step=1, min_sample_step=1, cluster_selection_method='leaf', metric='euclidean', prediction_data=True)[source]

Bases: ConfigField

Parameters:
  • min_topic_size_range (List[int])

  • min_sample_range (List[int])

  • topic_size_step (int)

  • min_sample_step (int)

  • cluster_selection_method (str)

  • metric (str)

  • prediction_data (bool)

min_topic_size_range: List[int]
min_sample_range: List[int]
topic_size_step: int = 1
min_sample_step: int = 1
cluster_selection_method: str = 'leaf'
metric: str = 'euclidean'
prediction_data: bool = True
class pysyrev.core.config.UMAPConfig(n_neighbors=<factory>, n_components=<factory>, metric='cosine', min_dist=0.0, low_memory=False, random_state=42)[source]

Bases: ConfigField

Parameters:
n_neighbors: List[int]
n_components: List[int]
metric: str = 'cosine'
min_dist: float = 0.0
low_memory: bool = False
random_state: int = 42
class pysyrev.core.config.BertopicConfig(transformer_model='allenai/specter2_base', n_gram_range='bigram', language='english', calculate_probabilities=True)[source]

Bases: ConfigField

Parameters:
  • transformer_model (str)

  • n_gram_range (str)

  • language (str)

  • calculate_probabilities (bool)

transformer_model: str = 'allenai/specter2_base'
n_gram_range: str = 'bigram'
language: str = 'english'
calculate_probabilities: bool = True
class pysyrev.core.config.TopicExportConfig(export_dir, run_name=None)[source]

Bases: ConfigField

Output configuration for the topic-model stage.

Each run is stored in its own sub-directory: <export_dir>/<run_name>/. Leave run_name blank to auto-generate a timestamp at run time (directory creation is deferred to TopicModel.run()).

Parameters:
  • export_dir (str)

  • run_name (None | str)

export_dir: str
run_name: None | str = None

topic_report / llm / report sections

class pysyrev.core.config.TopicReportConfig(run_dir=None, model_index=0, export_to=None)[source]

Bases: ConfigField

Model-selection parameters for the topic-report stage.

Parameters:
  • run_dir (str)

  • model_index (int)

  • export_to (str)

run_dir: str = None
model_index: int = 0
export_to: str = None
class pysyrev.core.config.TopicLabelerConfig(provider, model_id, host=None, max_tokens=200, temperature=0.3, max_retries=2, max_concurrent_requests=5, n_repr_docs_for_labeling=3, system_prompt=None)[source]

Bases: ConfigField

LLM configuration for generating human-readable topic labels.

Parameters:
  • provider (str)

  • model_id (str)

  • host (None | str)

  • max_tokens (int)

  • temperature (float)

  • max_retries (int)

  • max_concurrent_requests (int)

  • n_repr_docs_for_labeling (int)

  • system_prompt (None | str)

provider: str
model_id: str
host: None | str = None
max_tokens: int = 200
temperature: float = 0.3
max_retries: int = 2
max_concurrent_requests: int = 5
n_repr_docs_for_labeling: int = 3
system_prompt: None | str = None
class pysyrev.core.config.ReportConfig(meta=None, sections=None)[source]

Bases: ConfigField

Parameters:
meta: None | ReportMetaConfig = None
sections: None | ReportSectionsConfig = None
class pysyrev.core.config.ReportMetaConfig(title='Bibliographic report Pysyrev', subtitle=None, author='Report generated with the pysyrev engine (v0.1)', date_format='%d/%m/%Y', version='1.0.0', summary=None)[source]

Bases: ConfigField

Parameters:
  • title (str)

  • subtitle (None | str)

  • author (str)

  • date_format (str)

  • version (str)

  • summary (None | str)

title: str = 'Bibliographic report Pysyrev'
subtitle: None | str = None
author: str = 'Report generated with the pysyrev engine (v0.1)'
date_format: str = '%d/%m/%Y'
version: str = '1.0.0'
summary: None | str = None
class pysyrev.core.config.ReportSectionsConfig(topics=None, bib_network=None, temporal=None, topic_characteristics=None, topic_similarity=None, paper_selection=None, extra=None)[source]

Bases: ConfigField

Parameters:
  • topics (TopicsSectionConfig)

  • bib_network (BibNetworkSectionConfig)

  • temporal (TemporalSectionConfig)

  • topic_characteristics (TopicCharacteristicsConfig)

  • topic_similarity (TopicSimilarityConfig)

  • paper_selection (PaperSelectionConfig)

  • extra (None | List[dict])

topics: TopicsSectionConfig = None
bib_network: BibNetworkSectionConfig = None
temporal: TemporalSectionConfig = None
topic_characteristics: TopicCharacteristicsConfig = None
topic_similarity: TopicSimilarityConfig = None
paper_selection: PaperSelectionConfig = None
extra: None | List[dict] = None