Configuration classes¶

These dataclasses mirror the YAML structure. They are produced by Config.load() and consumed by the runtime pipeline classes.

Root¶

class pysyrev.core.config.Config(env=None, bib=None, review=None, bib_network=None, topic_model=None, topic_report=None, report=None, llm=None, bib_network_graphs=None)[source]¶

Bases: object

Root configuration object.

All sections are optional — only the sections present in the YAML are executed. The canonical stage order is: bib → review → bib-network → topic-model → topic-report.

Config.load() propagates outputs between stages automatically when doc_dataset / run_dir are left blank, so a full-pipeline YAML requires no explicit cross-section paths.

Parameters:

env (None | str)
bib (None | BibConfig)
review (None | ReviewConfig)
bib_network (None | BibNetworkConfig)
topic_model (None | TopicModelConfig)
topic_report (None | TopicReportConfig)
report (None | ReportConfig)
llm (None | TopicLabelerConfig)
bib_network_graphs (None | BibNetworkReportConfig)

env: None | str = None¶

bib: None | BibConfig = None¶

review: None | ReviewConfig = None¶

bib_network: None | BibNetworkConfig = None¶

topic_model: None | TopicModelConfig = None¶

topic_report: None | TopicReportConfig = None¶

report: None | ReportConfig = None¶

llm: None | TopicLabelerConfig = None¶

bib_network_graphs: None | BibNetworkReportConfig = None¶

classmethod load(config_file)[source]¶

Load a YAML config file.

Steps:

Read the YAML.
Load the .env file referenced by the root-level env: key (if any).
Resolve all ${VAR} references.
Propagate outputs between stages when doc_dataset / run_dir are left blank (auto-detection of the latest run in each export_dir).
Build typed dataclasses for every section present.

bib section¶

class pysyrev.core.config.BibConfig(wos=None, open_alex=None, scopus=None, pubmed=None, export=None, clean=None, extract=None, resolve_references=None, merge=None)[source]¶

Bases: ConfigField

Parameters:

wos (None | str | WosSourceConfig)
open_alex (None | str | OpenAlexSourceConfig)
scopus (None | str)
pubmed (None | str)
export (None | BibExportConfig)
clean (CleanConfig)
extract (ExtractConfig)
resolve_references (ResolveReferencesConfig)
merge (MergeConfig)

wos: None | str | WosSourceConfig = None¶

open_alex: None | str | OpenAlexSourceConfig = None¶

scopus: None | str = None¶

pubmed: None | str = None¶

export: None | BibExportConfig = None¶

clean: CleanConfig = None¶

extract: ExtractConfig = None¶

resolve_references: ResolveReferencesConfig = None¶

merge: MergeConfig = None¶

class pysyrev.core.config.WosSourceConfig(source='file', file=None, api=None)[source]¶

Bases: ConfigField

One WoS source: either a file path, or an API config. Exactly one of file / api must be set.

Parameters:

source (str)
file (None | str)
api (None | WosApiConfig)

source: str = 'file'¶

file: None | str = None¶

api: None | WosApiConfig = None¶

class pysyrev.core.config.WosApiConfig(api_key, query, cache_dir=None)[source]¶

Bases: ConfigField

Configuration for retrieving WoS records via the Expanded API.

Parameters:

api_key (str)
query (str)
cache_dir (None | str)

api_key: str¶

query: str¶

cache_dir: None | str = None¶

class pysyrev.core.config.OpenAlexSourceConfig(source='file', file=None, api=None)[source]¶

Bases: ConfigField

One OpenAlex source: either a file path, or an API config. Exactly one of file / api must be set.

Parameters:

source (str)
file (None | str)
api (None | OpenAlexApiConfig)

source: str = 'file'¶

file: None | str = None¶

api: None | OpenAlexApiConfig = None¶

class pysyrev.core.config.OpenAlexApiConfig(api_key, email=None, query=None, filters=None, cache_dir=None)[source]¶

Bases: ConfigField

Configuration for retrieving works via the OpenAlex API.

Either query (free-text BM25) or filters (structured) must be set (both can also be combined). email enables the polite pool — it is optional but strongly recommended for non-trivial usage.

Parameters:

api_key (str)
email (None | str)
query (None | str)
filters (None | dict)
cache_dir (None | str)

api_key: str¶

email: None | str = None¶

query: None | str = None¶

filters: None | dict = None¶

cache_dir: None | str = None¶

class pysyrev.core.config.CleanConfig(min_signals_to_reject=2, extra_garbage_phrases=None, use_langdetect=False)[source]¶

Bases: ConfigField

Parameters:

min_signals_to_reject (int)
extra_garbage_phrases (None | List[str])
use_langdetect (bool)

min_signals_to_reject: int = 2¶

extra_garbage_phrases: None | List[str] = None¶

use_langdetect: bool = False¶

class pysyrev.core.config.ExtractConfig(include_doc_type=None, exclude_doc_type=None, year=1900, nb_citations=0, language=None, scorer='partial_token_sort_ratio', score_cutoff=90)[source]¶

Bases: ConfigField

Parameters:

include_doc_type (None | List[str])
exclude_doc_type (None | List[str])
year (int)
nb_citations (int)
language (None | str | List[str])
scorer (str)
score_cutoff (int)

include_doc_type: None | List[str] = None¶

exclude_doc_type: None | List[str] = None¶

year: int = 1900¶

nb_citations: int = 0¶

language: None | str | List[str] = None¶

scorer: str = 'partial_token_sort_ratio'¶

score_cutoff: int = 90¶

class pysyrev.core.config.MergeConfig(title_similarity=98, ngram_size=3, max_candidates_per_row=200, scorer='token_set_ratio')[source]¶

Bases: ConfigField

Parameters:

title_similarity (int)
ngram_size (int)
max_candidates_per_row (int)
scorer (str)

title_similarity: int = 98¶

ngram_size: int = 3¶

max_candidates_per_row: int = 200¶

scorer: str = 'token_set_ratio'¶

class pysyrev.core.config.ResolveReferencesConfig(enabled=False, flag_unresolved=False, fuzzy_score_cutoff=90, ngram_size=3, max_candidates=50, scorer='token_set_ratio')[source]¶

Bases: ConfigField

Parameters:

enabled (bool)
flag_unresolved (bool)
fuzzy_score_cutoff (int)
ngram_size (int)
max_candidates (int)
scorer (str)

enabled: bool = False¶

flag_unresolved: bool = False¶

fuzzy_score_cutoff: int = 90¶

ngram_size: int = 3¶

max_candidates: int = 50¶

scorer: str = 'token_set_ratio'¶

class pysyrev.core.config.BibExportConfig(export_dir, run_name=None, dataset=None)[source]¶

Bases: ConfigField

Output configuration for the bib stage.

Each run is stored in <export_dir>/<run_name>/bib_dataset.csv. Call resolve() (done automatically by BibDataset.from_config) to finalise the run directory and set dataset on the instance. Leave run_name blank to auto-generate a timestamp.

Parameters:

export_dir (str)
run_name (None | str)
dataset (str)

export_dir: str¶

run_name: None | str = None¶

dataset: str = None¶

resolve()[source]¶: Create the run directory and set the output CSV path.

review section¶

class pysyrev.core.config.ReviewConfig(export, text_inputs, inclusion_criteria, exclusion_criteria, reviewers, workflow, doc_dataset=None, batch_size=100, api_pause=30.0, decision_rule='majority', sample_size=None, max_retries=None, max_concurrent_requests=None, items_per_call=None)[source]¶

Bases: ConfigField

Parameters:

export (ReviewExportConfig)
text_inputs (List[str])
inclusion_criteria (str)
exclusion_criteria (str)
reviewers (List[ReviewerConfig])
workflow (List[dict])
doc_dataset (None | str)
batch_size (int)
api_pause (float)
decision_rule (str)
sample_size (None | int)
max_retries (None | int)
max_concurrent_requests (None | int)
items_per_call (None | int)

export: ReviewExportConfig¶

text_inputs: List[str]¶

inclusion_criteria: str¶

exclusion_criteria: str¶

reviewers: List[ReviewerConfig]¶

workflow: List[dict]¶

doc_dataset: None | str = None¶

batch_size: int = 100¶

api_pause: float = 30.0¶

decision_rule: str = 'majority'¶

sample_size: None | int = None¶

max_retries: None | int = None¶

max_concurrent_requests: None | int = None¶

items_per_call: None | int = None¶

class pysyrev.core.config.ReviewerConfig(model_id, host, provider, name, max_tokens, temperature, reasoning_effort, backstory, additional_context, reasoning='brief', max_retries=None, max_concurrent_requests=None, items_per_call=None)[source]¶

Bases: ConfigField

Mirror of one entry under review.reviewers in the YAML. Cross-section fields like inclusion_criteria are NOT here — they live at the ReviewConfig level and are wired together by the runtime layer.

Parameters:

model_id (str)
host (str)
provider (str)
name (str)
max_tokens (int)
temperature (float)
reasoning_effort (str)
backstory (str)
additional_context (str)
reasoning (str)
max_retries (None | int)
max_concurrent_requests (None | int)
items_per_call (None | int)

model_id: str¶

host: str¶

provider: str¶

name: str¶

max_tokens: int¶

temperature: float¶

reasoning_effort: str¶

backstory: str¶

additional_context: str¶

reasoning: str = 'brief'¶

max_retries: None | int = None¶

max_concurrent_requests: None | int = None¶

items_per_call: None | int = None¶

class pysyrev.core.config.ReviewExportConfig(export_dir, run_name=None, cache_dir=None, included_docs=None, total_docs=None)[source]¶

Bases: ConfigField

Output configuration for the review stage.

Declare the parent directory (export_dir) and an optional run label (run_name). If run_name is left blank, resolve() generates a timestamp name (YYYY-MM-DDTHHMMSS) at run time so that successive test runs never overwrite each other.

resolve() must be called before the review runs (done automatically by LLMReview.from_config). It creates the run directory, sets included_docs / total_docs on the instance, and defaults cache_dir to <run_dir>/cache/ when not explicitly provided.

Downstream sections (bib_network, topic_model) can reference the output via config.review.export.included_docs after resolve(), or leave doc_dataset blank to have Config.load auto-detect the most recent run.

Parameters:

export_dir (str)
run_name (str)
cache_dir (str)
included_docs (str)
total_docs (str)

export_dir: str¶

run_name: str = None¶

cache_dir: str = None¶

included_docs: str = None¶

total_docs: str = None¶

resolve()[source]¶: Finalise run_name, create output directories, and set file paths.

bib_network section¶

class pysyrev.core.config.BibNetworkConfig(doc_dataset=None, coupling_network=None, cocitation_network=None, export=None)[source]¶

Bases: ConfigField

Parameters:

doc_dataset (str)
coupling_network (CouplingNetworkConfig)
cocitation_network (CocitationNetworkConfig)
export (None | BibNetworkExportConfig)

doc_dataset: str = None¶

coupling_network: CouplingNetworkConfig = None¶

cocitation_network: CocitationNetworkConfig = None¶

export: None | BibNetworkExportConfig = None¶

class pysyrev.core.config.CouplingNetworkConfig(use_resolved=False, use_unresolved=False, min_shared=1)[source]¶

Bases: ConfigField

Parameters:

use_resolved (bool)
use_unresolved (bool)
min_shared (int)

use_resolved: bool = False¶

use_unresolved: bool = False¶

min_shared: int = 1¶

class pysyrev.core.config.CocitationNetworkConfig(use_resolved=False, use_unresolved=False, min_cocitations=1)[source]¶

Bases: ConfigField

Parameters:

use_resolved (bool)
use_unresolved (bool)
min_cocitations (int)

use_resolved: bool = False¶

use_unresolved: bool = False¶

min_cocitations: int = 1¶

class pysyrev.core.config.BibNetworkExportConfig(export_dir, run_name=None, coupling_graph=None, cocitation_graph=None)[source]¶

Bases: ConfigField

Output configuration for the bib_network stage.

Each run is stored in <export_dir>/<run_name>/. Leave run_name blank to auto-generate a timestamp. Call resolve() to finalise the run directory and set file paths.

Parameters:

export_dir (str)
run_name (None | str)
coupling_graph (None | str)
cocitation_graph (None | str)

export_dir: str¶

run_name: None | str = None¶

coupling_graph: None | str = None¶

cocitation_graph: None | str = None¶

resolve()[source]¶: Create the run directory and set output file paths.

topic_model section¶

class pysyrev.core.config.TopicModelConfig(export, doc_dataset=None, distance='euclidean', keep_n_results=10, coherence_scorer=None, hdbscan=None, umap=None, bertopic=None, berteley=None, ctfidf=None, topic_distribution=None)[source]¶

Bases: ConfigField

Parameters:

export (TopicExportConfig)
doc_dataset (None | str)
distance (str)
keep_n_results (int)
coherence_scorer (CoherenceScorerConfig)
hdbscan (HDBSCANConfig)
umap (UMAPConfig)
bertopic (BertopicConfig)
berteley (BerteleyConfig)
ctfidf (CTFIDFConfig)
topic_distribution (TopicDistributionConfig)

export: TopicExportConfig¶

doc_dataset: None | str = None¶

distance: str = 'euclidean'¶

keep_n_results: int = 10¶

coherence_scorer: CoherenceScorerConfig = None¶

hdbscan: HDBSCANConfig = None¶

umap: UMAPConfig = None¶

bertopic: BertopicConfig = None¶

berteley: BerteleyConfig = None¶

ctfidf: CTFIDFConfig = None¶

topic_distribution: TopicDistributionConfig = None¶

class pysyrev.core.config.HDBSCANConfig(min_topic_size_range=<factory>, min_sample_range=<factory>, topic_size_step=1, min_sample_step=1, cluster_selection_method='leaf', metric='euclidean', prediction_data=True)[source]¶

Bases: ConfigField

Parameters:

min_topic_size_range (List[int])
min_sample_range (List[int])
topic_size_step (int)
min_sample_step (int)
cluster_selection_method (str)
metric (str)
prediction_data (bool)

min_topic_size_range: List[int]¶

min_sample_range: List[int]¶

topic_size_step: int = 1¶

min_sample_step: int = 1¶

cluster_selection_method: str = 'leaf'¶

metric: str = 'euclidean'¶

prediction_data: bool = True¶

class pysyrev.core.config.UMAPConfig(n_neighbors=<factory>, n_components=<factory>, metric='cosine', min_dist=0.0, low_memory=False, random_state=42)[source]¶

Bases: ConfigField

Parameters:

n_neighbors (List[int])
n_components (List[int])
metric (str)
min_dist (float)
low_memory (bool)
random_state (int)

n_neighbors: List[int]¶

n_components: List[int]¶

metric: str = 'cosine'¶

min_dist: float = 0.0¶

low_memory: bool = False¶

random_state: int = 42¶

class pysyrev.core.config.BertopicConfig(transformer_model='allenai/specter2_base', n_gram_range='bigram', language='english', calculate_probabilities=True)[source]¶

Bases: ConfigField

Parameters:

transformer_model (str)
n_gram_range (str)
language (str)
calculate_probabilities (bool)

transformer_model: str = 'allenai/specter2_base'¶

n_gram_range: str = 'bigram'¶

language: str = 'english'¶

calculate_probabilities: bool = True¶

class pysyrev.core.config.TopicExportConfig(export_dir, run_name=None)[source]¶

Bases: ConfigField

Output configuration for the topic-model stage.

Each run is stored in its own sub-directory: <export_dir>/<run_name>/. Leave run_name blank to auto-generate a timestamp at run time (directory creation is deferred to TopicModel.run()).

Parameters:

export_dir (str)
run_name (None | str)

export_dir: str¶

run_name: None | str = None¶

topic_report / llm / report sections¶

class pysyrev.core.config.TopicReportConfig(run_dir=None, model_index=0, export_to=None)[source]¶

Bases: ConfigField

Model-selection parameters for the topic-report stage.

Parameters:

run_dir (str)
model_index (int)
export_to (str)

run_dir: str = None¶

model_index: int = 0¶

export_to: str = None¶

class pysyrev.core.config.TopicLabelerConfig(provider, model_id, host=None, max_tokens=200, temperature=0.3, max_retries=2, max_concurrent_requests=5, n_repr_docs_for_labeling=3, system_prompt=None)[source]¶

Bases: ConfigField

LLM configuration for generating human-readable topic labels.

Parameters:

provider (str)
model_id (str)
host (None | str)
max_tokens (int)
temperature (float)
max_retries (int)
max_concurrent_requests (int)
n_repr_docs_for_labeling (int)
system_prompt (None | str)

provider: str¶

model_id: str¶

host: None | str = None¶

max_tokens: int = 200¶

temperature: float = 0.3¶

max_retries: int = 2¶

max_concurrent_requests: int = 5¶

n_repr_docs_for_labeling: int = 3¶

system_prompt: None | str = None¶

class pysyrev.core.config.ReportConfig(meta=None, sections=None)[source]¶

Bases: ConfigField

Parameters:

meta (None | ReportMetaConfig)
sections (None | ReportSectionsConfig)

meta: None | ReportMetaConfig = None¶

sections: None | ReportSectionsConfig = None¶

class pysyrev.core.config.ReportMetaConfig(title='Bibliographic report — Pysyrev', subtitle=None, author='Report generated with the pysyrev engine (v0.1)', date_format='%d/%m/%Y', version='1.0.0', summary=None)[source]¶

Bases: ConfigField

Parameters:

title (str)
subtitle (None | str)
author (str)
date_format (str)
version (str)
summary (None | str)

title: str = 'Bibliographic report — Pysyrev'¶

subtitle: None | str = None¶

author: str = 'Report generated with the pysyrev engine (v0.1)'¶

date_format: str = '%d/%m/%Y'¶

version: str = '1.0.0'¶

summary: None | str = None¶

class pysyrev.core.config.ReportSectionsConfig(topics=None, bib_network=None, temporal=None, topic_characteristics=None, topic_similarity=None, paper_selection=None, extra=None)[source]¶

Bases: ConfigField

Parameters:

topics (TopicsSectionConfig)
bib_network (BibNetworkSectionConfig)
temporal (TemporalSectionConfig)
topic_characteristics (TopicCharacteristicsConfig)
topic_similarity (TopicSimilarityConfig)
paper_selection (PaperSelectionConfig)
extra (None | List[dict])

topics: TopicsSectionConfig = None¶

bib_network: BibNetworkSectionConfig = None¶

temporal: TemporalSectionConfig = None¶

topic_characteristics: TopicCharacteristicsConfig = None¶

topic_similarity: TopicSimilarityConfig = None¶

paper_selection: PaperSelectionConfig = None¶

extra: None | List[dict] = None¶