How to write a pseudo-algorithm with algorithm2e package inside a tcolorbox environment?
up vote
3
down vote
favorite
I've recently seen in the second edition of Reinforcement Learning: an introduction by Sutton and Barto an appealing way to display pseudo-algorithms. In the following an example image.
I think that the environment is done with tcolorbox package and I think that I should be able to make something similar. However, I like to put my pseudo-algorithm in an algorithm environment through algorithm2e package. There is a way to mix the two things? The ideal case would be to have the background color, the box, and the captions the same as the image and the internal structure of the algorithm environment of algorithm2e.
This is the code of what I've done so far:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[ruled,longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
begin{document}
begin{algorithm}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
caption{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
end{algorithm}
end{document}
This is the result:
Basically, it's only the algorithm part. I don't know how to start to change its appearance since in the documentation there are very few commands to change the visual effect and none of them seems helpful.
tcolorbox algorithm2e
add a comment |
up vote
3
down vote
favorite
I've recently seen in the second edition of Reinforcement Learning: an introduction by Sutton and Barto an appealing way to display pseudo-algorithms. In the following an example image.
I think that the environment is done with tcolorbox package and I think that I should be able to make something similar. However, I like to put my pseudo-algorithm in an algorithm environment through algorithm2e package. There is a way to mix the two things? The ideal case would be to have the background color, the box, and the captions the same as the image and the internal structure of the algorithm environment of algorithm2e.
This is the code of what I've done so far:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[ruled,longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
begin{document}
begin{algorithm}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
caption{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
end{algorithm}
end{document}
This is the result:
Basically, it's only the algorithm part. I don't know how to start to change its appearance since in the documentation there are very few commands to change the visual effect and none of them seems helpful.
tcolorbox algorithm2e
Obviously, it is done usingtcolorbox
.
– Bernard
Nov 23 at 19:14
@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I've recently seen in the second edition of Reinforcement Learning: an introduction by Sutton and Barto an appealing way to display pseudo-algorithms. In the following an example image.
I think that the environment is done with tcolorbox package and I think that I should be able to make something similar. However, I like to put my pseudo-algorithm in an algorithm environment through algorithm2e package. There is a way to mix the two things? The ideal case would be to have the background color, the box, and the captions the same as the image and the internal structure of the algorithm environment of algorithm2e.
This is the code of what I've done so far:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[ruled,longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
begin{document}
begin{algorithm}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
caption{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
end{algorithm}
end{document}
This is the result:
Basically, it's only the algorithm part. I don't know how to start to change its appearance since in the documentation there are very few commands to change the visual effect and none of them seems helpful.
tcolorbox algorithm2e
I've recently seen in the second edition of Reinforcement Learning: an introduction by Sutton and Barto an appealing way to display pseudo-algorithms. In the following an example image.
I think that the environment is done with tcolorbox package and I think that I should be able to make something similar. However, I like to put my pseudo-algorithm in an algorithm environment through algorithm2e package. There is a way to mix the two things? The ideal case would be to have the background color, the box, and the captions the same as the image and the internal structure of the algorithm environment of algorithm2e.
This is the code of what I've done so far:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[ruled,longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
begin{document}
begin{algorithm}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
caption{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
end{algorithm}
end{document}
This is the result:
Basically, it's only the algorithm part. I don't know how to start to change its appearance since in the documentation there are very few commands to change the visual effect and none of them seems helpful.
tcolorbox algorithm2e
tcolorbox algorithm2e
edited Nov 23 at 19:56
asked Nov 23 at 19:13
gvgramazio
1,410520
1,410520
Obviously, it is done usingtcolorbox
.
– Bernard
Nov 23 at 19:14
@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17
add a comment |
Obviously, it is done usingtcolorbox
.
– Bernard
Nov 23 at 19:14
@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17
Obviously, it is done using
tcolorbox
.– Bernard
Nov 23 at 19:14
Obviously, it is done using
tcolorbox
.– Bernard
Nov 23 at 19:14
@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17
@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17
add a comment |
2 Answers
2
active
oldest
votes
up vote
6
down vote
accepted
I missed the part in the algorithm2e manual when it is stated that the option H
makes the environment non-floatable, thus it could be put inside the tcolorbox environment.
This is the updated code:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
begin{document}
begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]
begin{algorithm}[H]
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{algorithm}
end{tcolorbox}
end{document}
This is the visual result:
Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.
You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41
1
@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57
add a comment |
up vote
2
down vote
This is more a long comment that an answer which was already provided by gvgramazio.
I just propose to integrate tcolorbox+algoritm
environments inside a new myalgorithm
environment. Using before upper
and after upper
options, the algorithm
environment can be easily integrated inside a tcolorbox. And the new environment includes title as as second parameter. The first parameter is optional and can help to customize the box:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
newtcolorbox{myalgorithm}[2]{%
fonttitle=bfseries,
title=#2,
#1,
before upper={begin{algorithm}[H]},
after upper={end{algorithm}}
}
begin{document}
begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
end{document}
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
I missed the part in the algorithm2e manual when it is stated that the option H
makes the environment non-floatable, thus it could be put inside the tcolorbox environment.
This is the updated code:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
begin{document}
begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]
begin{algorithm}[H]
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{algorithm}
end{tcolorbox}
end{document}
This is the visual result:
Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.
You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41
1
@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57
add a comment |
up vote
6
down vote
accepted
I missed the part in the algorithm2e manual when it is stated that the option H
makes the environment non-floatable, thus it could be put inside the tcolorbox environment.
This is the updated code:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
begin{document}
begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]
begin{algorithm}[H]
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{algorithm}
end{tcolorbox}
end{document}
This is the visual result:
Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.
You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41
1
@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57
add a comment |
up vote
6
down vote
accepted
up vote
6
down vote
accepted
I missed the part in the algorithm2e manual when it is stated that the option H
makes the environment non-floatable, thus it could be put inside the tcolorbox environment.
This is the updated code:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
begin{document}
begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]
begin{algorithm}[H]
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{algorithm}
end{tcolorbox}
end{document}
This is the visual result:
Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.
I missed the part in the algorithm2e manual when it is stated that the option H
makes the environment non-floatable, thus it could be put inside the tcolorbox environment.
This is the updated code:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
begin{document}
begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]
begin{algorithm}[H]
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{algorithm}
end{tcolorbox}
end{document}
This is the visual result:
Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.
answered Nov 23 at 19:32
gvgramazio
1,410520
1,410520
You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41
1
@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57
add a comment |
You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41
1
@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57
You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41
You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41
1
1
@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57
@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57
add a comment |
up vote
2
down vote
This is more a long comment that an answer which was already provided by gvgramazio.
I just propose to integrate tcolorbox+algoritm
environments inside a new myalgorithm
environment. Using before upper
and after upper
options, the algorithm
environment can be easily integrated inside a tcolorbox. And the new environment includes title as as second parameter. The first parameter is optional and can help to customize the box:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
newtcolorbox{myalgorithm}[2]{%
fonttitle=bfseries,
title=#2,
#1,
before upper={begin{algorithm}[H]},
after upper={end{algorithm}}
}
begin{document}
begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
end{document}
add a comment |
up vote
2
down vote
This is more a long comment that an answer which was already provided by gvgramazio.
I just propose to integrate tcolorbox+algoritm
environments inside a new myalgorithm
environment. Using before upper
and after upper
options, the algorithm
environment can be easily integrated inside a tcolorbox. And the new environment includes title as as second parameter. The first parameter is optional and can help to customize the box:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
newtcolorbox{myalgorithm}[2]{%
fonttitle=bfseries,
title=#2,
#1,
before upper={begin{algorithm}[H]},
after upper={end{algorithm}}
}
begin{document}
begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
end{document}
add a comment |
up vote
2
down vote
up vote
2
down vote
This is more a long comment that an answer which was already provided by gvgramazio.
I just propose to integrate tcolorbox+algoritm
environments inside a new myalgorithm
environment. Using before upper
and after upper
options, the algorithm
environment can be easily integrated inside a tcolorbox. And the new environment includes title as as second parameter. The first parameter is optional and can help to customize the box:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
newtcolorbox{myalgorithm}[2]{%
fonttitle=bfseries,
title=#2,
#1,
before upper={begin{algorithm}[H]},
after upper={end{algorithm}}
}
begin{document}
begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
end{document}
This is more a long comment that an answer which was already provided by gvgramazio.
I just propose to integrate tcolorbox+algoritm
environments inside a new myalgorithm
environment. Using before upper
and after upper
options, the algorithm
environment can be easily integrated inside a tcolorbox. And the new environment includes title as as second parameter. The first parameter is optional and can help to customize the box:
documentclass{article}
usepackage[utf8]{inputenc}
usepackage[longend]{algorithm2e}
usepackage{textgreek}
usepackage{amssymb}
usepackage{tcolorbox}
newtcolorbox{myalgorithm}[2]{%
fonttitle=bfseries,
title=#2,
#1,
before upper={begin{algorithm}[H]},
after upper={end{algorithm}}
}
begin{document}
begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}
Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;
Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;
ForEach{episode}{
Initialize S;
ForEach{step of episode}{
Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);
Take action $A$, observe $R$, $S'$;
$Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;
$S leftarrow S'$;
}
}
end{myalgorithm}
end{document}
answered Nov 26 at 16:26
Ignasi
90.8k4164303
90.8k4164303
add a comment |
add a comment |
Thanks for contributing an answer to TeX - LaTeX Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f461468%2fhow-to-write-a-pseudo-algorithm-with-algorithm2e-package-inside-a-tcolorbox-envi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Obviously, it is done using
tcolorbox
.– Bernard
Nov 23 at 19:14
@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17