Web Navigation¶
This environment is included in A2Perf.
Description¶
The web navigation environment aims at enabling the development of compositional tasks that can be represented by a dependency graph. Using Compositional Design of Environments (CoDE), propesed by Google Research in ‘Environment Generation for Zero-Shot Compositional Reinforcement Learning’, websites are generated automatically, after which the policy has to complete the proposed webpages.
Action Space¶
The action should be passed as a scalar. Two types of actions are possible. Firstly, abstract navigation allows to directly refer to an element, and the profile is irrelevant. In this case, the action is converted to a tuple. If abstract navigation is desired, we have to pass use_conceptual=True
when initializing the environment. Secondly, the action can refer to a pair of elements and profile fields. The agent will then enter the value of the profile key corresponding to the selected DOM
element.
For example, with abstract navigation, action=5
refers to the 5-th element in the DOM tree where the tree is linearized using the get_dom_elements
function. Without abstract navigation, ‘5’ refers to both profile and element indices, i.e., (element_index, profile_index) where action=profile_index*number_of_dom_elements+element_index
.
Observation Space¶
raw_state=True
, by default, the observation is wrapped.profile_key, profile_value, profile_key_mask, profile_value_mask, dom_elements, dom_profile_joint_mask, time_step, dom_attribute_mask, dom_profile_intersection, dom_profile_intersection_mask, dom_features
, where the values are arrays. - profile_key
, profile_value
, profile_key_mask
, profile_value_mask
:{"Departure Date": "Friday", "Destination Airport": "Los Angeles (LAX)"}
. These keys and values are subsequently embedded into
vectors. All keys and values are embedded in fixed length vectors, which raises the need for padding the shorter embeddings. This is where the masks comes in, the mask contains ones where the embedding relates to the original key or value, and zeros where the embedding is padded. - dom_elements
, dom_elements_mask
, dom_attribute_mask
, dom_features
:<div>
, <header>
etc. - dom_profile_intersection
, dom_profile_intersection_mask
:["first", "name"]
) and for each element attribute tokens (such as ["initial", "name", ":"]
), the overlapping tokens are embedded. The intersection is a 5D tensor of shape (number of elements, max number of attributes, number of profile fields, number of action types (2), max sequence length)
. - time_step
:raw_state=True
, the raw state is returned as a MiniWoBState
object. This object stores the raw website information, which can be accessed with the following attributes:obs.utterance
: returns the task utterance, a dictionary providing the profile key and value pairs.obs.phrase
: returns the Phrase object of the utterance.obs.tokens
: returns the tokens used for the encoding of the utterance.obs.fields
: returns the key-value pairs extracted from the utterance.obs.dom
: returns the root DOM structure.obs.dom_elements
: returns a flattened lsit of all DOM elements
Rewards¶
At every timestep, the agent receives a small penalty to encourage efficient navigation. Next, a potential-based positive reward is given to the agent after successfully linking a key to a field and entering the correct value. Lastly, the agent receives a task success reward (1.0 or -1.0) when the final state is reached or a time out is encountered.
To give an example, consider booking flights where the agent has the following profile: {"Departure Date": "Friday", "Destination Airport": "Los Angeles (LAX)"}
, which has two fields (Departure Date and Destination Airport). The agent starts by picking a field, and tries to find the correpsonding text box in the page. The value corresponding to the field (eg. Destination Airport), is subsequently typed into the text box. If this is correct, the agent will receive a positive reward of 1/2,
where the denominator, 2, is the number of fields in the profile.
Starting State¶
Upon resetting the website, all episode fields are emptied and a clean webpage is returned.
Episode End¶
The episode can be ended under multiple conditions. Firstly, if the number of steps is greater then the maximum allowed number of steps, the environment is terminated. The maximum number of steps can be defined when initializing the environment, by default, 6 steps is the limit.
Arguments¶
When creating the web navigation environment, there is parameters we have to define and optional parameters. Firstly, we have to define the number of websites that needs to be created, with the num_websites
parameter. Next, we have to either specify a difficulty for the websites or we can provide a design. When creating the environment we thus either specify difficulty
or designs
.
import gymnasium as gym
import a2perf.domains.web_navigation
env = gym.make('WebNavigation-v0', num_websites=1, difficutly=1, ...)
Required parameters:¶
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
int |
|
The number of websites to be created. |
|
Optional, int |
|
Defines the difficulty of the webpage(s)
that are being created. A random agent has
a >=50% chance of completing a level 1
website, a >=25% chance of completing a
level 2 website and a >=10% chance of
completing a level 3 website. You either
define |
|
Optional, list[dict[str, Any]] |
|
You can pass the design for a number of
websites, where each website corresponds to
one dictionary in the list. If you specify
designs, note that you need to specify at
least the number of websites defined with
|
Optional parameters:¶
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
int |
|
Defines the seed for the random number
generator and the |
|
str |
|
Path to the directory which contains a zipfile with json files describing difficulty levels for webpages. |
|
Vocabulary |
|
The global_vocabulary gathers all characters and corresponding tokens. Is used to create embeddings for profile dictionaries and DOM dictionaries. |
|
bool |
|
If |
|
bool |
|
If |
|
int |
|
Defines the maximum number of steps that can be taken by the environment. |
|
str |
|
Possible render modes are |
|
bool |
|
If |
|
bool |
|
If true, the action spac expects abstract navigation, else an action refers to a pair of elements and profile fields. |
|
dict |
|
Reset you can pass kwargs in the options argument, not possible in step though
Version History¶
v0: Initial versions release