Configuration classes¶
These dataclasses mirror the YAML structure. They are produced by
Config.load() and consumed by
the runtime pipeline classes.
Root¶
- class pysyrev.core.config.Config(env=None, bib=None, review=None, bib_network=None, topic_model=None, topic_report=None, report=None, llm=None, bib_network_graphs=None)[source]¶
Bases:
objectRoot configuration object.
All sections are optional — only the sections present in the YAML are executed. The canonical stage order is:
bib → review → bib-network → topic-model → topic-report.Config.load()propagates outputs between stages automatically whendoc_dataset/run_dirare left blank, so a full-pipeline YAML requires no explicit cross-section paths.- Parameters:
env (None | str)
bib (None | BibConfig)
review (None | ReviewConfig)
bib_network (None | BibNetworkConfig)
topic_model (None | TopicModelConfig)
topic_report (None | TopicReportConfig)
report (None | ReportConfig)
llm (None | TopicLabelerConfig)
bib_network_graphs (None | BibNetworkReportConfig)
- review: None | ReviewConfig = None¶
- bib_network: None | BibNetworkConfig = None¶
- topic_model: None | TopicModelConfig = None¶
- topic_report: None | TopicReportConfig = None¶
- report: None | ReportConfig = None¶
- llm: None | TopicLabelerConfig = None¶
- classmethod load(config_file)[source]¶
Load a YAML config file.
- Steps:
Read the YAML.
Load the .env file referenced by the root-level
env:key (if any).Resolve all
${VAR}references.Propagate outputs between stages when
doc_dataset/run_dirare left blank (auto-detection of the latest run in each export_dir).Build typed dataclasses for every section present.
bib section¶
- class pysyrev.core.config.BibConfig(wos=None, open_alex=None, scopus=None, pubmed=None, export=None, clean=None, extract=None, resolve_references=None, merge=None)[source]¶
Bases:
ConfigField- Parameters:
wos (None | str | WosSourceConfig)
open_alex (None | str | OpenAlexSourceConfig)
scopus (None | str)
pubmed (None | str)
export (None | BibExportConfig)
clean (CleanConfig)
extract (ExtractConfig)
resolve_references (ResolveReferencesConfig)
merge (MergeConfig)
- wos: None | str | WosSourceConfig = None¶
- open_alex: None | str | OpenAlexSourceConfig = None¶
- export: None | BibExportConfig = None¶
- clean: CleanConfig = None¶
- extract: ExtractConfig = None¶
- resolve_references: ResolveReferencesConfig = None¶
- merge: MergeConfig = None¶
- class pysyrev.core.config.WosSourceConfig(source='file', file=None, api=None)[source]¶
Bases:
ConfigFieldOne WoS source: either a file path, or an API config. Exactly one of file / api must be set.
- Parameters:
source (str)
file (None | str)
api (None | WosApiConfig)
- api: None | WosApiConfig = None¶
- class pysyrev.core.config.WosApiConfig(api_key, query, cache_dir=None)[source]¶
Bases:
ConfigFieldConfiguration for retrieving WoS records via the Expanded API.
- class pysyrev.core.config.OpenAlexSourceConfig(source='file', file=None, api=None)[source]¶
Bases:
ConfigFieldOne OpenAlex source: either a file path, or an API config. Exactly one of file / api must be set.
- Parameters:
source (str)
file (None | str)
api (None | OpenAlexApiConfig)
- api: None | OpenAlexApiConfig = None¶
- class pysyrev.core.config.OpenAlexApiConfig(api_key, email=None, query=None, filters=None, cache_dir=None)[source]¶
Bases:
ConfigFieldConfiguration for retrieving works via the OpenAlex API.
Either query (free-text BM25) or filters (structured) must be set (both can also be combined). email enables the polite pool — it is optional but strongly recommended for non-trivial usage.
- Parameters:
- class pysyrev.core.config.CleanConfig(min_signals_to_reject=2, extra_garbage_phrases=None, use_langdetect=False)[source]¶
Bases:
ConfigField- Parameters:
- class pysyrev.core.config.ExtractConfig(include_doc_type=None, exclude_doc_type=None, year=1900, nb_citations=0, language=None, scorer='partial_token_sort_ratio', score_cutoff=90)[source]¶
Bases:
ConfigField- Parameters:
- class pysyrev.core.config.MergeConfig(title_similarity=98, ngram_size=3, max_candidates_per_row=200, scorer='token_set_ratio')[source]¶
Bases:
ConfigField
- class pysyrev.core.config.ResolveReferencesConfig(enabled=False, flag_unresolved=False, fuzzy_score_cutoff=90, ngram_size=3, max_candidates=50, scorer='token_set_ratio')[source]¶
Bases:
ConfigField- Parameters:
- class pysyrev.core.config.BibExportConfig(export_dir, run_name=None, dataset=None)[source]¶
Bases:
ConfigFieldOutput configuration for the bib stage.
Each run is stored in
<export_dir>/<run_name>/bib_dataset.csv. Callresolve()(done automatically byBibDataset.from_config) to finalise the run directory and setdataseton the instance. Leaverun_nameblank to auto-generate a timestamp.
review section¶
- class pysyrev.core.config.ReviewConfig(export, text_inputs, inclusion_criteria, exclusion_criteria, reviewers, workflow, doc_dataset=None, batch_size=100, api_pause=30.0, decision_rule='majority', sample_size=None, max_retries=None, max_concurrent_requests=None, items_per_call=None)[source]¶
Bases:
ConfigField- Parameters:
export (ReviewExportConfig)
inclusion_criteria (str)
exclusion_criteria (str)
reviewers (List[ReviewerConfig])
doc_dataset (None | str)
batch_size (int)
api_pause (float)
decision_rule (str)
sample_size (None | int)
max_retries (None | int)
max_concurrent_requests (None | int)
items_per_call (None | int)
- export: ReviewExportConfig¶
- reviewers: List[ReviewerConfig]¶
- class pysyrev.core.config.ReviewerConfig(model_id, host, provider, name, max_tokens, temperature, reasoning_effort, backstory, additional_context, reasoning='brief', max_retries=None, max_concurrent_requests=None, items_per_call=None)[source]¶
Bases:
ConfigFieldMirror of one entry under review.reviewers in the YAML. Cross-section fields like inclusion_criteria are NOT here — they live at the ReviewConfig level and are wired together by the runtime layer.
- Parameters:
- class pysyrev.core.config.ReviewExportConfig(export_dir, run_name=None, cache_dir=None, included_docs=None, total_docs=None)[source]¶
Bases:
ConfigFieldOutput configuration for the review stage.
Declare the parent directory (
export_dir) and an optional run label (run_name). Ifrun_nameis left blank,resolve()generates a timestamp name (YYYY-MM-DDTHHMMSS) at run time so that successive test runs never overwrite each other.resolve()must be called before the review runs (done automatically byLLMReview.from_config). It creates the run directory, setsincluded_docs/total_docson the instance, and defaultscache_dirto<run_dir>/cache/when not explicitly provided.Downstream sections (
bib_network,topic_model) can reference the output viaconfig.review.export.included_docsafterresolve(), or leavedoc_datasetblank to haveConfig.loadauto-detect the most recent run.
bib_network section¶
- class pysyrev.core.config.BibNetworkConfig(doc_dataset=None, coupling_network=None, cocitation_network=None, export=None)[source]¶
Bases:
ConfigField- Parameters:
doc_dataset (str)
coupling_network (CouplingNetworkConfig)
cocitation_network (CocitationNetworkConfig)
export (None | BibNetworkExportConfig)
- coupling_network: CouplingNetworkConfig = None¶
- cocitation_network: CocitationNetworkConfig = None¶
- export: None | BibNetworkExportConfig = None¶
- class pysyrev.core.config.CouplingNetworkConfig(use_resolved=False, use_unresolved=False, min_shared=1)[source]¶
Bases:
ConfigField
- class pysyrev.core.config.CocitationNetworkConfig(use_resolved=False, use_unresolved=False, min_cocitations=1)[source]¶
Bases:
ConfigField
- class pysyrev.core.config.BibNetworkExportConfig(export_dir, run_name=None, coupling_graph=None, cocitation_graph=None)[source]¶
Bases:
ConfigFieldOutput configuration for the bib_network stage.
Each run is stored in
<export_dir>/<run_name>/. Leaverun_nameblank to auto-generate a timestamp. Callresolve()to finalise the run directory and set file paths.- Parameters:
topic_model section¶
- class pysyrev.core.config.TopicModelConfig(export, doc_dataset=None, distance='euclidean', keep_n_results=10, coherence_scorer=None, hdbscan=None, umap=None, bertopic=None, berteley=None, ctfidf=None, topic_distribution=None)[source]¶
Bases:
ConfigField- Parameters:
export (TopicExportConfig)
doc_dataset (None | str)
distance (str)
keep_n_results (int)
coherence_scorer (CoherenceScorerConfig)
hdbscan (HDBSCANConfig)
umap (UMAPConfig)
bertopic (BertopicConfig)
berteley (BerteleyConfig)
ctfidf (CTFIDFConfig)
topic_distribution (TopicDistributionConfig)
- export: TopicExportConfig¶
- coherence_scorer: CoherenceScorerConfig = None¶
- hdbscan: HDBSCANConfig = None¶
- umap: UMAPConfig = None¶
- bertopic: BertopicConfig = None¶
- berteley: BerteleyConfig = None¶
- ctfidf: CTFIDFConfig = None¶
- topic_distribution: TopicDistributionConfig = None¶
- class pysyrev.core.config.HDBSCANConfig(min_topic_size_range=<factory>, min_sample_range=<factory>, topic_size_step=1, min_sample_step=1, cluster_selection_method='leaf', metric='euclidean', prediction_data=True)[source]¶
Bases:
ConfigField- Parameters:
- class pysyrev.core.config.UMAPConfig(n_neighbors=<factory>, n_components=<factory>, metric='cosine', min_dist=0.0, low_memory=False, random_state=42)[source]¶
Bases:
ConfigField- Parameters:
- class pysyrev.core.config.BertopicConfig(transformer_model='allenai/specter2_base', n_gram_range='bigram', language='english', calculate_probabilities=True)[source]¶
Bases:
ConfigField- Parameters:
- class pysyrev.core.config.TopicExportConfig(export_dir, run_name=None)[source]¶
Bases:
ConfigFieldOutput configuration for the topic-model stage.
Each run is stored in its own sub-directory:
<export_dir>/<run_name>/. Leaverun_nameblank to auto-generate a timestamp at run time (directory creation is deferred toTopicModel.run()).
topic_report / llm / report sections¶
- class pysyrev.core.config.TopicReportConfig(run_dir=None, model_index=0, export_to=None)[source]¶
Bases:
ConfigFieldModel-selection parameters for the topic-report stage.
- class pysyrev.core.config.TopicLabelerConfig(provider, model_id, host=None, max_tokens=200, temperature=0.3, max_retries=2, max_concurrent_requests=5, n_repr_docs_for_labeling=3, system_prompt=None)[source]¶
Bases:
ConfigFieldLLM configuration for generating human-readable topic labels.
- Parameters:
- class pysyrev.core.config.ReportConfig(meta=None, sections=None)[source]¶
Bases:
ConfigField- Parameters:
meta (None | ReportMetaConfig)
sections (None | ReportSectionsConfig)
- meta: None | ReportMetaConfig = None¶
- sections: None | ReportSectionsConfig = None¶
- class pysyrev.core.config.ReportMetaConfig(title='Bibliographic report — Pysyrev', subtitle=None, author='Report generated with the pysyrev engine (v0.1)', date_format='%d/%m/%Y', version='1.0.0', summary=None)[source]¶
Bases:
ConfigField- Parameters:
- class pysyrev.core.config.ReportSectionsConfig(topics=None, bib_network=None, temporal=None, topic_characteristics=None, topic_similarity=None, paper_selection=None, extra=None)[source]¶
Bases:
ConfigField- Parameters:
- topics: TopicsSectionConfig = None¶
- bib_network: BibNetworkSectionConfig = None¶
- temporal: TemporalSectionConfig = None¶
- topic_characteristics: TopicCharacteristicsConfig = None¶
- topic_similarity: TopicSimilarityConfig = None¶
- paper_selection: PaperSelectionConfig = None¶