From owner-chemistry@ccl.net Fri Mar 15 11:24:00 2019 From: "Thomas Manz thomasamanz[*]gmail.com" To: CCL Subject: CCL: =?UTF-8?Q?Re=3A_CCL=3A_Re=3A_CCL=3A_=E2=80=9CSurvivorship_bias=E2=80=9D_in_science?= =?UTF-8?Q?_and_the_Marcel_Swart_DFT_poll?= Message-Id: <-53651-190315094728-20492-212JmnH0TzufYRRwBEyN/A/a\server.ccl.net> X-Original-From: Thomas Manz Content-Type: multipart/alternative; boundary="00000000000037a8f305842247eb" Date: Fri, 15 Mar 2019 07:47:10 -0600 MIME-Version: 1.0 Sent to CCL by: Thomas Manz [thomasamanz*gmail.com] --00000000000037a8f305842247eb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Regarding users preferences for DFT functionals, accuracy is no doubt an important consideration, but also ease of use is a consideration. There are many factors that go into a user's choice of functional: (a) How hard is it to converge? Many meta-GGAs are more difficult to converge than GGAs. (b) How computationally expensive is it? For planewave basis sets, the hybrid functionals are more computationally expensive than the GGAs (c) How easily available is it? Some functionals are more widely available in codes than others. (d) How accurate is the functional across different material classes for a wide range of properties? For example, does it give good results for magnetic materials? Does it work for metallic conductors? Does it give good molecular properties? Some of the benchmark datasets are heavily weighted towards molecules, especially systems containing fewer than 100 atoms, but people care also about large systems. So, if a functional works great for small systems, but not large extended materials, then it may not be popular even if it performs well on many of the benchmark datasets. (e) How familiar/popular is the functional? How many other people use it? A functional that has broad use probably has a lot of things going for it. A person who is new to the field is probably going to start by considering the functionals most other people are using. (f) How theoretically appealing is the functional? Does it have many empirically fitted parameters or few of them? Functionals that have >50 empirically adjustable parameters are scary. Even if they give accurate results for benchmark datasets, they scare users into thinking they have been over-parameterized. A functional that doesn't reproduce the energetics as accurately, but has only 1 to 4 well-reasoned parameter values is going to be more "comfortable" for users. (g) How long has the functional been around? Older functionals have had more time to pick up new users. A functional released during the last 5 years, even if it is a great one, hasn't had much time to gain adoption yet= . (h) Does it work good for large materials? For example, to do calculations on a material containing 5000 atoms in the unit cell. So, if you want to understand the popularity of different DFT functionals, you have to consider all of these factors. Tom On Thu, Mar 14, 2019 at 2:52 PM Lehtola, Susi susi.lehtola=3D-=3Dhelsinki.f= i < owner-chemistry=-=ccl.net> wrote: > > Sent to CCL by: "Lehtola, Susi" [susi.lehtola###helsinki.fi] > On 3/13/19 8:23 PM, Grigoriy Zhurko reg_zhurko!^!chemcraftprog.com wrote: > > > > Sent to CCL by: Grigoriy Zhurko [reg_zhurko*_*chemcraftprog.com] > > > > > > I have heard that the science nowadays experiences a crisis related > > to the reproduction of previous results: > > > > > https://arstechnica.com/science/2018/08/why-do-only-two-thirds-of-famous-= social-science-results-replicate-its-complicated/ > > > > For example, a study in 2015 was made in the field of psychology, > > which showed than only one third of previous research results in > > psychology could be replicated. > > > > This problem is related to the =E2=80=9CSurvivorship bias=E2=80=9D: > > > > https://en.wikipedia.org/wiki/Survivorship_bias > > > > Quantum chemists often encounter this problem. If a chemist has > > computed some properties and got the agreement with the experiment, > > his results are =E2=80=9Cpublishable=E2=80=9D, so he is able to publish= it in a > > reviewed journal. At the same time, if the results of his computation > > disagree with the experiment, he often has to abandon his results. > > So, when a DFT functional review is based on a citation analysis, > > this review is usually biased as noted above. > > There *are* density functional reviews based on actual accuracy data, > see e.g. doi:10.1080/00268976.2017.1333644 > > > The way to avoid the survivorship bias in chemistry is to > > significantly take into account the negative results together with > > the positive ones. There is an analogy with a video on Youtube: it > > is evidently good that the youtube users are able to put dislikes > > along with the likes, and I could show some video clips which > > illustrate that the ratio between the number of likes and the number > > of dislikes is often a much better indicator of the clip quality, > > than the number of likes or the number of views. As far as I know, at > > the moment only the Marcel Swart DFT poll is a kind of research on > > the reliability of DFT functionals, which is based on using negative > > results (=E2=80=9Cdislikes=E2=80=9D) together with the positive results= (=E2=80=9Clikes=E2=80=9D). > > That=E2=80=99s why I think that the community should support the work o= f Mr. > > Swart, although he should make it more rigorous (I wrote some details > > about this earlier). > > But since there are way more users than developers, the poll is biased > since most users have heard of any new functionals since PBE or B3LYP, > and survivor bias is even more inherent among the users! For instance, > the winners of the poll from 2010 to 2017 show astounding variety: > PBE0, PBE0, PBE, PBE, PBE, PBE, PBE, PBE0 > > Goerigk and Mehta write in their recent paper: "It is striking that even > in the 2017 poll one third of the first-division methods belong to the > 16 worst DFA approaches for GMTKN55!" > > For more discussion on the DFT poll, see the paper "A Trip to the > Density Functional Theory Zoo: Warnings and Recommendations for the > User", doi:10.1071/CH19023. > > (No, I don't want to start yet another flame war on the poll.) > -- > ------------------------------------------------------------------ > Mr. Susi Lehtola, PhD Junior Fellow, Adjunct Professor > susi.lehtola%a%helsinki.fi University of Helsinki > http://susilehtola.github.io/ Finland > ------------------------------------------------------------------ > Susi Lehtola, dosentti, FT tutkijatohtori > susi.lehtola%a%helsinki.fi Helsingin yliopisto > http://susilehtola.github.io/ > ------------------------------------------------------------------ > > > > -=3D This is automatically added to each message by the mailing script = =3D-> > > --00000000000037a8f305842247eb Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Regarding users pr= eferences for DFT functionals, accuracy is no doubt an important considerat= ion, but also ease of use is a consideration.=C2=A0
There are man= y factors that go into a user's choice of functional:=C2=A0
<= br>
(a) How hard is it to converge? Many meta-GGAs are more diffi= cult to converge than GGAs.

(b) How computationall= y expensive is it? For planewave basis sets, the hybrid functionals are mor= e computationally expensive than the GGAs

(c) How = easily available is it? Some functionals are more widely available in codes= than others.

(d) How accurate is the functional a= cross different material classes for a wide range of properties? For exampl= e, does it give good results for magnetic materials? Does it work for metal= lic conductors? Does it give good molecular properties? Some of the benchma= rk datasets are heavily weighted towards molecules, especially systems cont= aining fewer than 100 atoms, but people care also about large systems. So, = if a functional works great for small systems, but not large extended mater= ials, then it may not be popular even if it performs well on many of the be= nchmark datasets.

(e) How familiar/popular is the = functional? How many other people use it? A functional that has broad use p= robably has a lot of things going for it. A person who is new to the field = is probably going to start by considering the functionals most other people= are using.

(f) How theoretically appealing is the= functional? Does it have many empirically fitted parameters or few of them= ? Functionals that have >50 empirically adjustable parameters are scary.= Even if they give accurate results for benchmark datasets, they scare user= s into thinking they have been over-parameterized. A functional that doesn&= #39;t reproduce the energetics as accurately, but has only 1 to 4 well-reas= oned parameter values is going to be more "comfortable" for users= .

(g) How long has the functional been around? Old= er functionals have had more time to pick up new users. A functional releas= ed during the last 5 years, even if it is a great one, hasn't had much = time to gain adoption yet.

(h) Does it work good f= or large materials? For example, to do calculations on a material containin= g 5000 atoms in the unit cell.

So, if you want to = understand the popularity of different DFT functionals, you have to conside= r all of these factors.

Tom

On Thu, Mar 14, 2= 019 at 2:52 PM Lehtola, Susi susi.lehtola=3D-=3Dhelsinki.fi <owner-ch= emistry=-=ccl.net> wrote:

Sent to CCL by: "Lehtola, Susi" [susi.lehtola###helsinki.fi]
On 3/13/19 8:23 PM, Grigoriy Zhurko reg_zhurko!^!chemcraftprog.com wrote= :
>
> Sent to CCL by: Grigoriy Zhurko [reg_zhurko*_*chemcraftprog.com] >
>
> I have heard that the science nowadays experiences a crisis related > to the reproduction of previous results:
>
> https://arstechnica.com/science/2018/08/why-do-o= nly-two-thirds-of-famous-social-science-results-replicate-its-complicated/<= /a>
>
> For example, a study in 2015 was made in the field of psychology,
> which showed than only one third of previous research results in
> psychology could be replicated.
>
> This problem is related to the =E2=80=9CSurvivorship bias=E2=80=9D: >
>
https://en.wikipedia.org/wiki/Survivorship_bias<= /a>
>
> Quantum chemists often encounter this problem. If a chemist has
> computed some properties and got the agreement with the experiment, > his results are =E2=80=9Cpublishable=E2=80=9D, so he is able to publis= h it in a
> reviewed journal. At the same time, if the results of his computation<= br> > disagree with the experiment, he often has to abandon his results.
> So, when a DFT functional review is based on a citation analysis,
> this review is usually biased as noted above.

There *are* density functional reviews based on actual accuracy data,
see e.g. doi:10.1080/00268976.2017.1333644

> The way to avoid the survivorship bias in chemistry is to
> significantly take into account the negative results together with > the positive ones. There is an analogy with a video on Youtube: it
> is evidently good that the youtube users are able to put dislikes
> along with the likes, and I could show some video clips which
> illustrate that the ratio between the number of likes and the number > of dislikes is often a much better indicator of the clip quality,
> than the number of likes or the number of views. As far as I know, at<= br> > the moment only the Marcel Swart DFT poll is a kind of research on
> the reliability of DFT functionals, which is based on using negative > results (=E2=80=9Cdislikes=E2=80=9D) together with the positive result= s (=E2=80=9Clikes=E2=80=9D).
> That=E2=80=99s why I think that the community should support the work = of Mr.
> Swart, although he should make it more rigorous (I wrote some details<= br> > about this earlier).

But since there are way more users than developers, the poll is biased
since most users have heard of any new functionals since PBE or B3LYP,
and survivor bias is even more inherent among the users! For instance,
the winners of the poll from 2010 to 2017 show astounding variety:
=C2=A0 PBE0, PBE0, PBE, PBE, PBE, PBE, PBE, PBE0

Goerigk and Mehta write in their recent paper: "It is striking that ev= en
in the 2017 poll one third of the first-division methods belong to the
16 worst DFA approaches for GMTKN55!"

For more discussion on the DFT poll, see the paper "A Trip to the
Density Functional Theory Zoo: Warnings and Recommendations for the
User", doi:10.1071/CH19023.

(No, I don't want to start yet another flame war on the poll.)
--
------------------------------------------------------------------
Mr. Susi Lehtola, PhD=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Junior= Fellow, Adjunct Professor
susi.lehtola%a%
helsinki.fi=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 University of Hel= sinki
http://susilehtola.github.io/=C2=A0 =C2=A0 =C2=A0Finland
------------------------------------------------------------------
Susi Lehtola, dosentti, FT=C2=A0 =C2=A0 =C2=A0 =C2=A0 tutkijatohtori
susi.lehtola%a%helsinki.fi=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Helsingin yliopis= to
http://susilehtola.github.io/
------------------------------------------------------------------



-=3D This is automatically added to each message by the mailing script =3D-=
E-mail to subscribers: CHEMISTRY=-=ccl.net or use:
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message

E-mail to administrators: CHEMISTRY-REQUEST=-=ccl.net or use
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/chemistry/sub_un= sub.shtml

Before posting, check wait time at: http://www.ccl.net

Job: http://www.ccl.net/jobs
Conferences: http://server.ccl.net/chemist= ry/announcements/conferences/

Search Messages: http://www.ccl.net/chemistry/sear= chccl/index.shtml
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/spammers.txt

RTFI: http://www.ccl.net/chemistry/aboutccl/ins= tructions/


--00000000000037a8f305842247eb-- From owner-chemistry@ccl.net Fri Mar 15 11:59:00 2019 From: "Thomas Manz thomasamanz()gmail.com" To: CCL Subject: CCL: =?UTF-8?Q?Re=3A_CCL=3A_Re=3A_CCL=3A_=E2=80=9CSurvivorship_bias=E2=80=9D_in_science?= =?UTF-8?Q?_and_the_Marcel_Swart_DFT_poll?= Message-Id: <-53652-190315110909-12180-p85mjlaaDXVpalte51sliQ-#-server.ccl.net> X-Original-From: Thomas Manz Content-Type: multipart/alternative; boundary="0000000000004d304f0584236b20" Date: Fri, 15 Mar 2019 09:08:50 -0600 MIME-Version: 1.0 Sent to CCL by: Thomas Manz [thomasamanz]![gmail.com] --0000000000004d304f0584236b20 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Susi, Thank you for letting us know about this new paper. It is quite interesting= . Tom On Thu, Mar 14, 2019 at 2:52 PM Lehtola, Susi susi.lehtola=3D-=3Dhelsinki.f= i < owner-chemistry+/-ccl.net> wrote: > > Sent to CCL by: "Lehtola, Susi" [susi.lehtola###helsinki.fi] > On 3/13/19 8:23 PM, Grigoriy Zhurko reg_zhurko!^!chemcraftprog.com wrote: > > > > Sent to CCL by: Grigoriy Zhurko [reg_zhurko*_*chemcraftprog.com] > > > > > > I have heard that the science nowadays experiences a crisis related > > to the reproduction of previous results: > > > > > https://arstechnica.com/science/2018/08/why-do-only-two-thirds-of-famous-= social-science-results-replicate-its-complicated/ > > > > For example, a study in 2015 was made in the field of psychology, > > which showed than only one third of previous research results in > > psychology could be replicated. > > > > This problem is related to the =E2=80=9CSurvivorship bias=E2=80=9D: > > > > https://en.wikipedia.org/wiki/Survivorship_bias > > > > Quantum chemists often encounter this problem. If a chemist has > > computed some properties and got the agreement with the experiment, > > his results are =E2=80=9Cpublishable=E2=80=9D, so he is able to publish= it in a > > reviewed journal. At the same time, if the results of his computation > > disagree with the experiment, he often has to abandon his results. > > So, when a DFT functional review is based on a citation analysis, > > this review is usually biased as noted above. > > There *are* density functional reviews based on actual accuracy data, > see e.g. doi:10.1080/00268976.2017.1333644 > > > The way to avoid the survivorship bias in chemistry is to > > significantly take into account the negative results together with > > the positive ones. There is an analogy with a video on Youtube: it > > is evidently good that the youtube users are able to put dislikes > > along with the likes, and I could show some video clips which > > illustrate that the ratio between the number of likes and the number > > of dislikes is often a much better indicator of the clip quality, > > than the number of likes or the number of views. As far as I know, at > > the moment only the Marcel Swart DFT poll is a kind of research on > > the reliability of DFT functionals, which is based on using negative > > results (=E2=80=9Cdislikes=E2=80=9D) together with the positive results= (=E2=80=9Clikes=E2=80=9D). > > That=E2=80=99s why I think that the community should support the work o= f Mr. > > Swart, although he should make it more rigorous (I wrote some details > > about this earlier). > > But since there are way more users than developers, the poll is biased > since most users have heard of any new functionals since PBE or B3LYP, > and survivor bias is even more inherent among the users! For instance, > the winners of the poll from 2010 to 2017 show astounding variety: > PBE0, PBE0, PBE, PBE, PBE, PBE, PBE, PBE0 > > Goerigk and Mehta write in their recent paper: "It is striking that even > in the 2017 poll one third of the first-division methods belong to the > 16 worst DFA approaches for GMTKN55!" > > For more discussion on the DFT poll, see the paper "A Trip to the > Density Functional Theory Zoo: Warnings and Recommendations for the > User", doi:10.1071/CH19023. > > (No, I don't want to start yet another flame war on the poll.) > -- > ------------------------------------------------------------------ > Mr. Susi Lehtola, PhD Junior Fellow, Adjunct Professor > susi.lehtola%a%helsinki.fi University of Helsinki > http://susilehtola.github.io/ Finland > ------------------------------------------------------------------ > Susi Lehtola, dosentti, FT tutkijatohtori > susi.lehtola%a%helsinki.fi Helsingin yliopisto > http://susilehtola.github.io/ > ------------------------------------------------------------------ > > > > -=3D This is automatically added to each message by the mailing script = =3D-> > > --0000000000004d304f0584236b20 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Susi,

Thank you for letting us know = about this new paper. It is quite interesting.

Tom=

On Thu, Mar 14, 2019 at 2:52 PM Lehtola, Susi susi.lehtola=3D-=3Dhelsinki.fi <owner-chemistry+/-ccl.net> wrote:

Sent to CCL by: "Lehtola, Susi" [susi.lehtola###helsinki.fi]
On 3/13/19 8:23 PM, Grigoriy Zhurko reg_zhurko!^!chemcraftprog.com wrote= :
>
> Sent to CCL by: Grigoriy Zhurko [reg_zhurko*_*chemcraftprog.com] >
>
> I have heard that the science nowadays experiences a crisis related > to the reproduction of previous results:
>
> https://arstechnica.com/science/2018/08/why-do-o= nly-two-thirds-of-famous-social-science-results-replicate-its-complicated/<= /a>
>
> For example, a study in 2015 was made in the field of psychology,
> which showed than only one third of previous research results in
> psychology could be replicated.
>
> This problem is related to the =E2=80=9CSurvivorship bias=E2=80=9D: >
>
https://en.wikipedia.org/wiki/Survivorship_bias<= /a>
>
> Quantum chemists often encounter this problem. If a chemist has
> computed some properties and got the agreement with the experiment, > his results are =E2=80=9Cpublishable=E2=80=9D, so he is able to publis= h it in a
> reviewed journal. At the same time, if the results of his computation<= br> > disagree with the experiment, he often has to abandon his results.
> So, when a DFT functional review is based on a citation analysis,
> this review is usually biased as noted above.

There *are* density functional reviews based on actual accuracy data,
see e.g. doi:10.1080/00268976.2017.1333644

> The way to avoid the survivorship bias in chemistry is to
> significantly take into account the negative results together with > the positive ones. There is an analogy with a video on Youtube: it
> is evidently good that the youtube users are able to put dislikes
> along with the likes, and I could show some video clips which
> illustrate that the ratio between the number of likes and the number > of dislikes is often a much better indicator of the clip quality,
> than the number of likes or the number of views. As far as I know, at<= br> > the moment only the Marcel Swart DFT poll is a kind of research on
> the reliability of DFT functionals, which is based on using negative > results (=E2=80=9Cdislikes=E2=80=9D) together with the positive result= s (=E2=80=9Clikes=E2=80=9D).
> That=E2=80=99s why I think that the community should support the work = of Mr.
> Swart, although he should make it more rigorous (I wrote some details<= br> > about this earlier).

But since there are way more users than developers, the poll is biased
since most users have heard of any new functionals since PBE or B3LYP,
and survivor bias is even more inherent among the users! For instance,
the winners of the poll from 2010 to 2017 show astounding variety:
=C2=A0 PBE0, PBE0, PBE, PBE, PBE, PBE, PBE, PBE0

Goerigk and Mehta write in their recent paper: "It is striking that ev= en
in the 2017 poll one third of the first-division methods belong to the
16 worst DFA approaches for GMTKN55!"

For more discussion on the DFT poll, see the paper "A Trip to the
Density Functional Theory Zoo: Warnings and Recommendations for the
User", doi:10.1071/CH19023.

(No, I don't want to start yet another flame war on the poll.)
--
------------------------------------------------------------------
Mr. Susi Lehtola, PhD=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Junior= Fellow, Adjunct Professor
susi.lehtola%a%
helsinki.fi=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 University of Hel= sinki
http://susilehtola.github.io/=C2=A0 =C2=A0 =C2=A0Finland
------------------------------------------------------------------
Susi Lehtola, dosentti, FT=C2=A0 =C2=A0 =C2=A0 =C2=A0 tutkijatohtori
susi.lehtola%a%helsinki.fi=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Helsingin yliopis= to
http://susilehtola.github.io/
------------------------------------------------------------------



-=3D This is automatically added to each message by the mailing script =3D-=
E-mail to subscribers: CHEMISTRY+/-ccl.net or use:
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message

E-mail to administrators: CHEMISTRY-REQUEST+/-ccl.net or use
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/chemistry/sub_un= sub.shtml

Before posting, check wait time at: http://www.ccl.net

Job: http://www.ccl.net/jobs
Conferences: http://server.ccl.net/chemist= ry/announcements/conferences/

Search Messages: http://www.ccl.net/chemistry/sear= chccl/index.shtml
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/spammers.txt

RTFI: http://www.ccl.net/chemistry/aboutccl/ins= tructions/


--0000000000004d304f0584236b20--