Web Navigation

This environment is included in A2Perf.

The Ariane RISC-V CPU

Description

The web navigation environment aims at enabling the development of compositional tasks that can be represented by a dependency graph. Using Compositional Design of Environments (CoDE), propesed by Google Research in ‘Environment Generation for Zero-Shot Compositional Reinforcement Learning’, websites are generated automatically, after which the policy has to complete the proposed webpages.

Action Space

The action should be passed as a scalar. Two types of actions are possible. Firstly, abstract navigation allows to directly refer to an element, and the profile is irrelevant. In this case, the action is converted to a tuple. If abstract navigation is desired, we have to pass use_conceptual=True when initializing the environment. Secondly, the action can refer to a pair of elements and profile fields. The agent will then enter the value of the profile key corresponding to the selected DOM element.

For example, with abstract navigation, action=5 refers to the 5-th element in the DOM tree where the tree is linearized using the get_dom_elements function. Without abstract navigation, ‘5’ refers to both profile and element indices, i.e., (element_index, profile_index) where action=profile_index*number_of_dom_elements+element_index.

Observation Space

When performing one step in the environment, we are given the option to return the raw state of wrapped state. To return the raw state, we pass raw_state=True, by default, the observation is wrapped.
The wrapped structure will return a dictionary with the following keys: profile_key, profile_value, profile_key_mask, profile_value_mask, dom_elements, dom_profile_joint_mask, time_step, dom_attribute_mask, dom_profile_intersection, dom_profile_intersection_mask, dom_features, where the values are arrays. - profile_key, profile_value, profile_key_mask, profile_value_mask:
The profile arrays are 2D arrays with the shape (number of fields, sequence length), dom arrays have the shape. The first keys of the wrapped observation relate to the profile, while the last few relate to the DOM tree. The profile of the webpage is the user profile, which contains keys and values which need to be filled in. When booking flights, this could be {"Departure Date": "Friday", "Destination Airport": "Los Angeles (LAX)"}. These keys and values are subsequently embedded into vectors. All keys and values are embedded in fixed length vectors, which raises the need for padding the shorter embeddings. This is where the masks comes in, the mask contains ones where the embedding relates to the original key or value, and zeros where the embedding is padded. - dom_elements, dom_elements_mask, dom_attribute_mask, dom_features:
Next the webpage is returned as DOM elements, again embedded. Examples of DOM elements are eg. <div>, <header> etc. - dom_profile_intersection, dom_profile_intersection_mask:
Next, the intersection between the profile and DOM elements is embedded and returned. For each profile field key and value tokens (such as ["first", "name"]) and for each element attribute tokens (such as ["initial", "name", ":"]), the overlapping tokens are embedded. The intersection is a 5D tensor of shape (number of elements, max number of attributes, number of profile fields, number of action types (2), max sequence length). - time_step:
The timestep is calculated as the number of steps taken, divided by the maximum number of steps allowed.

When raw_state=True, the raw state is returned as a MiniWoBState object. This object stores the raw website information, which can be accessed with the following attributes:
  • obs.utterance: returns the task utterance, a dictionary providing the profile key and value pairs.

  • obs.phrase: returns the Phrase object of the utterance.

  • obs.tokens: returns the tokens used for the encoding of the utterance.

  • obs.fields: returns the key-value pairs extracted from the utterance.

  • obs.dom: returns the root DOM structure.

  • obs.dom_elements: returns a flattened lsit of all DOM elements

Rewards

At every timestep, the agent receives a small penalty to encourage efficient navigation. Next, a potential-based positive reward is given to the agent after successfully linking a key to a field and entering the correct value. Lastly, the agent receives a task success reward (1.0 or -1.0) when the final state is reached or a time out is encountered.

To give an example, consider booking flights where the agent has the following profile: {"Departure Date": "Friday", "Destination Airport": "Los Angeles (LAX)"}, which has two fields (Departure Date and Destination Airport). The agent starts by picking a field, and tries to find the correpsonding text box in the page. The value corresponding to the field (eg. Destination Airport), is subsequently typed into the text box. If this is correct, the agent will receive a positive reward of 1/2, where the denominator, 2, is the number of fields in the profile.

Starting State

Upon resetting the website, all episode fields are emptied and a clean webpage is returned.

Episode End

The episode can be ended under multiple conditions. Firstly, if the number of steps is greater then the maximum allowed number of steps, the environment is terminated. The maximum number of steps can be defined when initializing the environment, by default, 6 steps is the limit.

Arguments

When creating the web navigation environment, there is parameters we have to define and optional parameters. Firstly, we have to define the number of websites that needs to be created, with the num_websites parameter. Next, we have to either specify a difficulty for the websites or we can provide a design. When creating the environment we thus either specify difficulty or designs.

import gymnasium as gym
import a2perf.domains.web_navigation

env = gym.make('WebNavigation-v0', num_websites=1, difficutly=1, ...)

Required parameters:

Parameter

Type

Default

Description

num_websites

int

None

The number of websites to be created.

difficulty

Optional, int

1

Defines the difficulty of the webpage(s) that are being created. A random agent has a >=50% chance of completing a level 1 website, a >=25% chance of completing a level 2 website and a >=10% chance of completing a level 3 website. You either define difficulty or designs, not both.

designs

Optional, list[dict[str, Any]]

None

You can pass the design for a number of websites, where each website corresponds to one dictionary in the list. If you specify designs, note that you need to specify at least the number of websites defined with num_websites. The designs returned is then num_websites randomly sampled from designs. You either define difficulty or designs, not both.

Optional parameters:

Parameter

Type

Default

Description

seed

int

0

Defines the seed for the random number generator and the reset method.

data_dir

str

"a2perf/domains/we b_navigation/environment_generation/data"

Path to the directory which contains a zipfile with json files describing difficulty levels for webpages.

global_vocabulary

Vocabulary

v ocabulary_node.LockedThreadedVocabulary()

The global_vocabulary gathers all characters and corresponding tokens. Is used to create embeddings for profile dictionaries and DOM dictionaries.

use_legacy_reset

bool

False

If True, the reset method returns only the observation. If False both the observation and info are returned.

use_legacy_step

bool

False

If True, the step method returns the observation, reward, (terminated or truncated), info. If False, both terminated and truncated are returned.

step_limit

int

25

Defines the maximum number of steps that can be taken by the environment.

render_mode

str

image

Possible render modes are test, which saves screenshots of the website in the screenshots attribute, rgb_array which returns a 3D array, and image, which returns a screenshot of the webstie.

raw_state

bool

False

If True, the raw observation is returned, else the observation is wrapped, for more info, see the Observation section of this notebook.

use_conceptual

bool

False

If true, the action spac expects abstract navigation, else an action refers to a pair of elements and profile fields.

**kwargs

dict

None

Reset you can pass kwargs in the options argument, not possible in step though

Version History

  • v0: Initial versions release