Topic model

class pysyrev.TopicModel(doc_dataset, allow_abbrev, distance, bertopic_model, topic_distribution, nr_repr_docs, export_dir, n_neighbors, n_components, min_topic_size_range, min_sample_range, topic_size_step, min_sample_step, keep_n_results, ranking_scorer, purity_scorer, run_name, overwrite=False)[source]

Bases: object

Parameters:
  • doc_dataset (str)

  • allow_abbrev (bool)

  • distance (str)

  • bertopic_model (BertopicModel)

  • topic_distribution (TopicDistribution)

  • nr_repr_docs (int)

  • export_dir (str)

  • n_neighbors (List[int])

  • n_components (List[int])

  • min_topic_size_range (List[int])

  • min_sample_range (List[int])

  • topic_size_step (int)

  • min_sample_step (int)

  • keep_n_results (int)

  • ranking_scorer (str)

  • purity_scorer (str)

  • run_name (None | str)

  • overwrite (bool)

doc_dataset: str
allow_abbrev: bool
distance: str
bertopic_model: BertopicModel
topic_distribution: TopicDistribution
nr_repr_docs: int
export_dir: str
n_neighbors: List[int]
n_components: List[int]
min_topic_size_range: List[int]
min_sample_range: List[int]
topic_size_step: int
min_sample_step: int
keep_n_results: int
ranking_scorer: str
purity_scorer: str
run_name: None | str
overwrite: bool = False
classmethod from_config(config)[source]

Build a TopicModel from a full Config object.

Return type:

TopicModel

Parameters:

config (Config)

run(dataset=None, show_progress=True)[source]

Run the topic modelling pipeline.

Parameters:

dataset (pd.DataFrame, optional) – Reviewed-included dataset. If None, loaded from doc_dataset (set via config or auto-detected by Config.load).

TopicReport

class pysyrev.TopicReport(run_dir, report_config, model_index=0, export_to=None, labeler_config=None, bib_network_config=None)[source]

Bases: object

Parameters:
run_dir: str
report_config: object
model_index: int = 0
export_to: None | str = None
labeler_config: object = None
bib_network_config: object = None
classmethod from_config(config)[source]
Return type:

TopicReport

Parameters:

config (Config)

property best_results: DataFrame
property selected_model_row: Series
property topic_info: DataFrame
property bertopic_results: DataFrame
property nb_topics: int
property metrics: Series
generate_report(output_file=None)[source]

Generate the PDF report. Returns the path of the written file.

Return type:

str

Parameters:

output_file (None | str)