fsrl.utils¶

BaseLogger¶

class fsrl.utils.BaseLogger(log_dir=None, log_txt=True, name=None)[source]¶

Bases: ABC

The base class for any logger which is compatible with trainer. All the loggers create four panels by default: train, test, loss, and update. Try to overwrite write() method to customize your own logger.

Parameters:

log_dir (str) – the log directory. Default to None.
log_txt (bool) – whether to log data in log_dir with name progress.txt. Default to True.
name (str) – the experiment name. If None, it will use the current time as the name. Default to None.

setup_checkpoint_fn(checkpoint_fn: Callable | None = None) → None[source]¶

Setup the function to obtain the model checkpoint, it will be called when using `logger.save_checkpoint()`.

Parameters:: checkpoint_fn (Optional[Callable]) – the hook function to get the checkpoint dictionary, defaults to None.

reset_data() → None[source]¶: Reset stored data

store(tab: str | None = None, **kwargs) → None[source]¶

Store any values to the current epoch buffer with prefix tab/.

Example use:

logger = EpochLogger(**logger_kwargs) logger.save_config(locals())

Parameters:: tab (str) – the prefix of the logging data, defaults to None.

write(step: int, display: bool = False, display_keys: Iterable[str] | None = None) → None[source]¶

Writing data to somewhere and reset the stored data.

Parameters:

step (int) – the current training step or epochs
display (bool) – whether print the logged data in terminal, default to False
display_keys (Iterable[str]) – a list of keys to be printed. If None, print all stored keys, default to None.

write_without_reset(*args, **kwarg) → None[source]¶: Writing data to somewhere without resetting the current stored stats, for tensorboard and wandb logger usage.

save_checkpoint(suffix: int | str | None = None) → None[source]¶

Use writer to log metadata when calling save_checkpoint_fn in trainer.

Parameters:: suffix (Optional[Union[int, str]]) – the suffix to be added to the stored checkpoint name, defaults to None.

save_config(config: dict, verbose=True) → None[source]¶

Log an experiment configuration.

Call this once at the top of your experiment, passing in all important config vars as a dict. This will serialize the config to JSON, while handling anything which can’t be serialized in a graceful way (writing as informative a string as possible).

Example use:

logger = BaseLogger(**logger_kwargs) logger.save_config(locals())

Parameters:

config (dict) – the configs to be stored.
verbose (bool) – whether to print the saved configs, default to True.

restore_data() → None[source]¶: Return the metadata from existing log. Not implemented for BaseLogger.

get_std(key: str) → float[source]¶

Get the standard deviation of the queried data in storage.

Parameters:: key (str) – the key of the queried data.
Returns:: the standard deviation.

get_mean(key: str) → float[source]¶

Get the mean of the queried data in storage.

Parameters:: key (str) – the key of the queried data.
Returns:: the mean.

get_mean_list(keys: Iterable[str]) → list[source]¶

Get the list of queried data in storage.

Parameters:: keys (Iterable[str]) – the keys of the queried data.
Returns:: the list of mean values.

get_mean_dict(keys: Iterable[str]) → dict[source]¶

Get the dict of queried data in storage.

Parameters:: keys (Iterable[str]) – the keys of the queried data.
Returns:: the dict of mean values.

property stats_mean: dict¶

property logger_keys: Iterable¶

display_tabular(display_keys: Iterable[str] | None = None) → None[source]¶

Display the keys of interest in a tabular format.

Parameters:: display_keys (Iterable[str]) – the keys to be displayed, if None, display all data. defaults to None.

print(msg: str, color='green') → None[source]¶

Print a colorized message to stdout.

Parameters:

msg (str) – the string message to be printed
color (str) – the colors for printing, the choices are `gray, red, green, yellow, blue, magenta, cyan, white, crimson`. Default to “green”.

TensorboardLogger¶

class fsrl.utils.TensorboardLogger(log_dir: str | None = None, log_txt: bool = True, name: str | None = None)[source]¶

Bases: BaseLogger

A logger with tensorboard SummaryWriter to visualize and log statistics.

Parameters:

log_dir (str) – the log directory. Default to None.
log_txt (bool) – whether to log data in log_dir with name progress.txt. Default to True.
name (str) – the experiment name. If None, it will use the current time as the name. Default to None.

write(step: int, display: bool = True, display_keys: Iterable[str] | None = None) → None[source]¶

Writing data to somewhere and reset the stored data.

Parameters:

step (int) – the current training step or epochs
display (bool) – whether print the logged data in terminal, default to False
display_keys (Iterable[str]) – a list of keys to be printed. If None, print all stored keys, default to None.

write_without_reset(step: int) → None[source]¶: Writing data to the tf event file without resetting the current stored stats.

restore_data() → Tuple[int, int, int][source]¶

Return the metadata from existing log. If it finds nothing or an error occurs during the recover process, it will return the default parameters.

Return Tuple[int, int, int]:: episode, env_step, gradient_step.

WandbLogger¶

class fsrl.utils.WandbLogger(config: dict = {}, project: str = 'fsrl', group: str = 'test', name: str | None = None, log_dir: str = 'log', log_txt: bool = True)[source]¶

Bases: BaseLogger

Weights and Biases logger that sends data to https://wandb.ai/.

A typical usage example:

config = {...} project = "test_cvpo" group = "SafetyCarCircle-v0" name =
"default_param" log_dir = "logs"

logger = WandbLogger(config, project, group, name, log_dir)
logger.save_config(config)

agent = CVPOAgent(env, logger=logger) agent.learn(train_envs)

Parameters:

config (str) – experiment configurations. Default to an empty dict.
project (str) – W&B project name. Default to “fsrl”.
group (str) – W&B group name. Default to “test”.
name (str) – W&B experiment run name. If None, it will use the current time as the name. Default to None.
log_dir (str) – the log directory. Default to None.
log_txt (bool) – whether to log data in log_dir with name progress.txt. Default to True.

write(step: int, display: bool = True, display_keys: Iterable[str] | None = None) → None[source]¶

Writing data to somewhere and reset the stored data.

Parameters:

step (int) – the current training step or epochs
display (bool) – whether print the logged data in terminal, default to False
display_keys (Iterable[str]) – a list of keys to be printed. If None, print all stored keys, default to None.

write_without_reset(step: int) → None[source]¶: Sending data to wandb without resetting the current stored stats.

restore_data() → None[source]¶: Not implemented yet

DummyLogger¶

class fsrl.utils.DummyLogger(*args, **kwarg)[source]¶

Bases: BaseLogger

A logger that inherent from the BaseLogger but does nothing. Used as the placeholder in trainer.

Net¶

class fsrl.utils.net.common.ActorCritic(actor: Module, critics: List | Module)[source]¶

Bases: Module

An actor-critic network for parsing parameters.

Parameters:

actor (nn.Module) – the actor network.
critic (nn.Module) – the critic network.

class fsrl.utils.net.continuous.DoubleCritic(preprocess_net1: ~torch.nn.modules.module.Module, preprocess_net2: ~torch.nn.modules.module.Module, hidden_sizes: ~typing.Sequence[int] = (), device: str | int | ~torch.device = 'cpu', preprocess_net_output_dim: int | None = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>, flatten_input: bool = True)[source]¶

Bases: Module

Double critic network. Will create an actor operated in continuous action space with structure of preprocess_net —> 1(q value).

Parameters:

preprocess_net1 – a self-defined preprocess_net which output a flattened hidden state.
preprocess_net2 – a self-defined preprocess_net which output a flattened hidden state.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
preprocess_net_output_dim (int) – the output dimension of preprocess_net.
linear_layer – use this module as linear layer. Default to nn.Linear.
flatten_input (bool) – whether to flatten input data for the last layer. Default to True.

For advanced usage (how to customize the network), please refer to tianshou’s build_the_network tutorial.

LagrangianOptimizer¶

class fsrl.utils.LagrangianOptimizer(pid: tuple = (0.05, 0.0005, 0.1))[source]¶

Bases: object

Lagrangian multiplier optimizer based on the PID controller, according to https://proceedings.mlr.press/v119/stooke20a.html.

Parameters:: pid (List) – the coefficients of the PID controller, kp, ki, kd.

Note

If kp and kd are 0, it reduced to a standard SGD-based Lagrangian optimizer.

step(value: float, threshold: float) → None[source]¶

Optimize the multiplier by one step

Parameters:

value (float) – the current value estimation
threshold (float) – the threshold of the value

get_lag() → float[source]¶: Get the lagrangian multiplier.

state_dict() → dict[source]¶: Get the parameters of this lagrangian optimizer

load_state_dict(params: dict) → None[source]¶: Load the parameters to continue training

ExperimentUtils¶

fsrl.utils.exp_util.seed_all(seed=1029, others: list | None = None) → None[source]¶

Fix the seeds of random, numpy, torch and the input others object.

Parameters:

seed (int) – defaults to 1029
others (Optional[list]) – other objects that want to be seeded, defaults to None

fsrl.utils.exp_util.load_config_and_model(path: str, best: bool = False)[source]¶

Load the configuration and trained model from a specified directory.

Parameters:

path – the directory path where the configuration and trained model are stored.
best – whether to load the best-performing model or the most recent one. Defaults to False.

Returns:

a tuple containing the configuration dictionary and the trained model.

Raises:

ValueError – if the specified directory does not exist.

fsrl.utils.exp_util.to_string(values)[source]¶

Recursively convert a sequence or dictionary of values to a string representation.

Parameters:: values – the sequence or dictionary of values to be converted to a string.
Returns:: a string representation of the input values.

fsrl.utils.exp_util.auto_name(default_cfg: dict, current_cfg: dict, prefix: str = '', suffix: str = '', skip_keys: list = ['task', 'reward_threshold', 'logdir', 'worker', 'project', 'group', 'name', 'prefix', 'suffix', 'save_interval', 'render', 'verbose', 'save_ckpt', 'training_num', 'testing_num', 'epoch', 'device', 'thread'], key_abbre: dict = {'cost_limit': 'cost', 'estep_dual_lr': 'elr', 'estep_iter_num': 'enum', 'estep_kl': 'ekl', 'mstep_dual_lr': 'mlr', 'mstep_iter_num': 'mnum', 'mstep_kl_mu': 'kl_mu', 'mstep_kl_std': 'kl_std', 'update_per_step': 'update'}) → str[source]¶

Automatic generate the name by comparing the current config with the default one.

Parameters:

default_cfg (dict) – a dictionary containing the default configuration values.
current_cfg (dict) – a dictionary containing the current configuration values.
prefix (str) – (optional) a string to be added at the beginning of the generated name.
suffix (str) – (optional) a string to be added at the end of the generated name.
skip_keys (list) – (optional) a list of keys to be skipped when generating the name.
key_abbre (dict) – (optional) a dictionary containing abbreviations for keys in the generated name.

Return str:

a string representing the generated experiment name.