fsrl.utils

BaseLogger

class fsrl.utils.BaseLogger(log_dir=None, log_txt=True, name=None)[source]

Bases: ABC

The base class for any logger which is compatible with trainer. All the loggers create four panels by default: train, test, loss, and update. Try to overwrite write() method to customize your own logger.

Parameters:
  • log_dir (str) – the log directory. Default to None.

  • log_txt (bool) – whether to log data in log_dir with name progress.txt. Default to True.

  • name (str) – the experiment name. If None, it will use the current time as the name. Default to None.

setup_checkpoint_fn(checkpoint_fn: Callable | None = None) None[source]

Setup the function to obtain the model checkpoint, it will be called when using `logger.save_checkpoint()`.

Parameters:

checkpoint_fn (Optional[Callable]) – the hook function to get the checkpoint dictionary, defaults to None.

reset_data() None[source]

Reset stored data

store(tab: str | None = None, **kwargs) None[source]

Store any values to the current epoch buffer with prefix tab/.

Example use:

logger = EpochLogger(**logger_kwargs) logger.save_config(locals())
Parameters:

tab (str) – the prefix of the logging data, defaults to None.

write(step: int, display: bool = False, display_keys: Iterable[str] | None = None) None[source]

Writing data to somewhere and reset the stored data.

Parameters:
  • step (int) – the current training step or epochs

  • display (bool) – whether print the logged data in terminal, default to False

  • display_keys (Iterable[str]) – a list of keys to be printed. If None, print all stored keys, default to None.

write_without_reset(*args, **kwarg) None[source]

Writing data to somewhere without resetting the current stored stats, for tensorboard and wandb logger usage.

save_checkpoint(suffix: int | str | None = None) None[source]

Use writer to log metadata when calling save_checkpoint_fn in trainer.

Parameters:

suffix (Optional[Union[int, str]]) – the suffix to be added to the stored checkpoint name, defaults to None.

save_config(config: dict, verbose=True) None[source]

Log an experiment configuration.

Call this once at the top of your experiment, passing in all important config vars as a dict. This will serialize the config to JSON, while handling anything which can’t be serialized in a graceful way (writing as informative a string as possible).

Example use:

logger = BaseLogger(**logger_kwargs) logger.save_config(locals())
Parameters:
  • config (dict) – the configs to be stored.

  • verbose (bool) – whether to print the saved configs, default to True.

restore_data() None[source]

Return the metadata from existing log. Not implemented for BaseLogger.

get_std(key: str) float[source]

Get the standard deviation of the queried data in storage.

Parameters:

key (str) – the key of the queried data.

Returns:

the standard deviation.

get_mean(key: str) float[source]

Get the mean of the queried data in storage.

Parameters:

key (str) – the key of the queried data.

Returns:

the mean.

get_mean_list(keys: Iterable[str]) list[source]

Get the list of queried data in storage.

Parameters:

keys (Iterable[str]) – the keys of the queried data.

Returns:

the list of mean values.

get_mean_dict(keys: Iterable[str]) dict[source]

Get the dict of queried data in storage.

Parameters:

keys (Iterable[str]) – the keys of the queried data.

Returns:

the dict of mean values.

property stats_mean: dict
property logger_keys: Iterable
display_tabular(display_keys: Iterable[str] | None = None) None[source]

Display the keys of interest in a tabular format.

Parameters:

display_keys (Iterable[str]) – the keys to be displayed, if None, display all data. defaults to None.

print(msg: str, color='green') None[source]

Print a colorized message to stdout.

Parameters:
  • msg (str) – the string message to be printed

  • color (str) – the colors for printing, the choices are `gray, red, green, yellow, blue, magenta, cyan, white, crimson`. Default to “green”.

TensorboardLogger

class fsrl.utils.TensorboardLogger(log_dir: str | None = None, log_txt: bool = True, name: str | None = None)[source]

Bases: BaseLogger

A logger with tensorboard SummaryWriter to visualize and log statistics.

Parameters:
  • log_dir (str) – the log directory. Default to None.

  • log_txt (bool) – whether to log data in log_dir with name progress.txt. Default to True.

  • name (str) – the experiment name. If None, it will use the current time as the name. Default to None.

write(step: int, display: bool = True, display_keys: Iterable[str] | None = None) None[source]

Writing data to somewhere and reset the stored data.

Parameters:
  • step (int) – the current training step or epochs

  • display (bool) – whether print the logged data in terminal, default to False

  • display_keys (Iterable[str]) – a list of keys to be printed. If None, print all stored keys, default to None.

write_without_reset(step: int) None[source]

Writing data to the tf event file without resetting the current stored stats.

restore_data() Tuple[int, int, int][source]

Return the metadata from existing log. If it finds nothing or an error occurs during the recover process, it will return the default parameters.

Return Tuple[int, int, int]:

episode, env_step, gradient_step.

WandbLogger

class fsrl.utils.WandbLogger(config: dict = {}, project: str = 'fsrl', group: str = 'test', name: str | None = None, log_dir: str = 'log', log_txt: bool = True)[source]

Bases: BaseLogger

Weights and Biases logger that sends data to https://wandb.ai/.

A typical usage example:

config = {...} project = "test_cvpo" group = "SafetyCarCircle-v0" name =
"default_param" log_dir = "logs"

logger = WandbLogger(config, project, group, name, log_dir)
logger.save_config(config)

agent = CVPOAgent(env, logger=logger) agent.learn(train_envs)
Parameters:
  • config (str) – experiment configurations. Default to an empty dict.

  • project (str) – W&B project name. Default to “fsrl”.

  • group (str) – W&B group name. Default to “test”.

  • name (str) – W&B experiment run name. If None, it will use the current time as the name. Default to None.

  • log_dir (str) – the log directory. Default to None.

  • log_txt (bool) – whether to log data in log_dir with name progress.txt. Default to True.

write(step: int, display: bool = True, display_keys: Iterable[str] | None = None) None[source]

Writing data to somewhere and reset the stored data.

Parameters:
  • step (int) – the current training step or epochs

  • display (bool) – whether print the logged data in terminal, default to False

  • display_keys (Iterable[str]) – a list of keys to be printed. If None, print all stored keys, default to None.

write_without_reset(step: int) None[source]

Sending data to wandb without resetting the current stored stats.

restore_data() None[source]

Not implemented yet

DummyLogger

class fsrl.utils.DummyLogger(*args, **kwarg)[source]

Bases: BaseLogger

A logger that inherent from the BaseLogger but does nothing. Used as the placeholder in trainer.

Net

class fsrl.utils.net.common.ActorCritic(actor: Module, critics: List | Module)[source]

Bases: Module

An actor-critic network for parsing parameters.

Parameters:
  • actor (nn.Module) – the actor network.

  • critic (nn.Module) – the critic network.

class fsrl.utils.net.continuous.DoubleCritic(preprocess_net1: ~torch.nn.modules.module.Module, preprocess_net2: ~torch.nn.modules.module.Module, hidden_sizes: ~typing.Sequence[int] = (), device: str | int | ~torch.device = 'cpu', preprocess_net_output_dim: int | None = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>, flatten_input: bool = True)[source]

Bases: Module

Double critic network. Will create an actor operated in continuous action space with structure of preprocess_net —> 1(q value).

Parameters:
  • preprocess_net1 – a self-defined preprocess_net which output a flattened hidden state.

  • preprocess_net2 – a self-defined preprocess_net which output a flattened hidden state.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • preprocess_net_output_dim (int) – the output dimension of preprocess_net.

  • linear_layer – use this module as linear layer. Default to nn.Linear.

  • flatten_input (bool) – whether to flatten input data for the last layer. Default to True.

For advanced usage (how to customize the network), please refer to tianshou’s build_the_network tutorial.

See also

Please refer to tianshou’s Net class as an instance of how preprocess_net is suggested to be defined.

forward(obs: ndarray | Tensor, act: ndarray | Tensor | None = None, info: Dict[str, Any] = {}) list[source]

Mapping: (s, a) -> logits -> Q(s, a).

predict(obs: ndarray | Tensor, act: ndarray | Tensor | None = None, info: Dict[str, Any] = {}) Tuple[Tensor, list][source]

Mapping: (s, a) -> logits -> Q(s, a).

Returns:

q value, and a list of two q values (used for Bellman backup)

class fsrl.utils.net.continuous.SingleCritic(preprocess_net: ~torch.nn.modules.module.Module, hidden_sizes: ~typing.Sequence[int] = (), device: str | int | ~torch.device = 'cpu', preprocess_net_output_dim: int | None = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>, flatten_input: bool = True)[source]

Bases: Critic

Simple critic network. Will create an actor operated in continuous action space with structure of preprocess_net —> 1(q value). It differs from tianshou’s original Critic in that the output will be a list to make the API consistent with DoubleCritic.

Parameters:
  • preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • preprocess_net_output_dim (int) – the output dimension of preprocess_net.

  • linear_layer – use this module as linear layer. Default to nn.Linear.

  • flatten_input (bool) – whether to flatten input data for the last layer. Default to True.

forward(obs: ndarray | Tensor, act: ndarray | Tensor | None = None, info: Dict[str, Any] = {}) Tensor[source]

Mapping: (s, a) -> logits -> Q(s, a).

predict(obs: ndarray | Tensor, act: ndarray | Tensor | None = None, info: Dict[str, Any] = {}) Tuple[Tensor, list][source]

Mapping: (s, a) -> logits -> Q(s, a).

Returns:

q value, and a list of two q values (used for Bellman backup)

LagrangianOptimizer

class fsrl.utils.LagrangianOptimizer(pid: tuple = (0.05, 0.0005, 0.1))[source]

Bases: object

Lagrangian multiplier optimizer based on the PID controller, according to https://proceedings.mlr.press/v119/stooke20a.html.

Parameters:

pid (List) – the coefficients of the PID controller, kp, ki, kd.

Note

If kp and kd are 0, it reduced to a standard SGD-based Lagrangian optimizer.

step(value: float, threshold: float) None[source]

Optimize the multiplier by one step

Parameters:
  • value (float) – the current value estimation

  • threshold (float) – the threshold of the value

get_lag() float[source]

Get the lagrangian multiplier.

state_dict() dict[source]

Get the parameters of this lagrangian optimizer

load_state_dict(params: dict) None[source]

Load the parameters to continue training

ExperimentUtils

fsrl.utils.exp_util.seed_all(seed=1029, others: list | None = None) None[source]

Fix the seeds of random, numpy, torch and the input others object.

Parameters:
  • seed (int) – defaults to 1029

  • others (Optional[list]) – other objects that want to be seeded, defaults to None

fsrl.utils.exp_util.load_config_and_model(path: str, best: bool = False)[source]

Load the configuration and trained model from a specified directory.

Parameters:
  • path – the directory path where the configuration and trained model are stored.

  • best – whether to load the best-performing model or the most recent one. Defaults to False.

Returns:

a tuple containing the configuration dictionary and the trained model.

Raises:

ValueError – if the specified directory does not exist.

fsrl.utils.exp_util.to_string(values)[source]

Recursively convert a sequence or dictionary of values to a string representation.

Parameters:

values – the sequence or dictionary of values to be converted to a string.

Returns:

a string representation of the input values.

fsrl.utils.exp_util.auto_name(default_cfg: dict, current_cfg: dict, prefix: str = '', suffix: str = '', skip_keys: list = ['task', 'reward_threshold', 'logdir', 'worker', 'project', 'group', 'name', 'prefix', 'suffix', 'save_interval', 'render', 'verbose', 'save_ckpt', 'training_num', 'testing_num', 'epoch', 'device', 'thread'], key_abbre: dict = {'cost_limit': 'cost', 'estep_dual_lr': 'elr', 'estep_iter_num': 'enum', 'estep_kl': 'ekl', 'mstep_dual_lr': 'mlr', 'mstep_iter_num': 'mnum', 'mstep_kl_mu': 'kl_mu', 'mstep_kl_std': 'kl_std', 'update_per_step': 'update'}) str[source]

Automatic generate the name by comparing the current config with the default one.

Parameters:
  • default_cfg (dict) – a dictionary containing the default configuration values.

  • current_cfg (dict) – a dictionary containing the current configuration values.

  • prefix (str) – (optional) a string to be added at the beginning of the generated name.

  • suffix (str) – (optional) a string to be added at the end of the generated name.

  • skip_keys (list) – (optional) a list of keys to be skipped when generating the name.

  • key_abbre (dict) – (optional) a dictionary containing abbreviations for keys in the generated name.

Return str:

a string representing the generated experiment name.