Jump to content

Semantic Scholar: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Fixed typo
Tags: Mobile edit Mobile web edit
No edit summary
 
(36 intermediate revisions by 17 users not shown)
Line 5: Line 5:
| type = [[Search engine]]
| type = [[Search engine]]
| author = [[Allen Institute for Artificial Intelligence]]
| author = [[Allen Institute for Artificial Intelligence]]
| launch_date = {{start date|2015|11|2}}<ref>{{cite journal|last1=Jones|first1=Nicola|title=Artificial-intelligence institute launches free science search engine|journal=[[Nature (journal)|Nature]]|year=2015|issn=1476-4687|doi=10.1038/nature.2015.18703|s2cid=182440976 |doi-access=free}}</ref>
| launch_date = {{Start date and age|2015|11|2}}<ref>{{cite journal|last1=Jones|first1=Nicola|title=Artificial-intelligence institute launches free science search engine|journal=[[Nature (journal)|Nature]]|year=2015|issn=1476-4687|doi=10.1038/nature.2015.18703|s2cid=182440976 |doi-access=free}}</ref>
| website = {{url|https://semanticscholar.org}}
| website = {{URL|https://semanticscholar.org}}
}}
}}


'''Semantic Scholar''' is a research tool powered by [[artificial intelligence]] for scientific literature, It was developed at the [[Allen Institute for AI]] and publicly released in November 2015.<ref name="Eunjung Cha 3Nov2015">{{Cite news |first1=Ariana |last1=Eunjung Cha |date=3 November 2015 |title=Paul Allen's AI research group unveils program that aims to shake up how we search scientific knowledge. Give it a try. |url=https://www.washingtonpost.com/news/to-your-health/wp/2015/11/02/paul-allens-ai-research-group-unveils-program-that-aims-to-shake-up-how-we-search-scientific-knowledge-give-it-a-try/ |url-status=live |archive-url=https://web.archive.org/web/20191106162910/https://www.washingtonpost.com/news/to-your-health/wp/2015/11/02/paul-allens-ai-research-group-unveils-program-that-aims-to-shake-up-how-we-search-scientific-knowledge-give-it-a-try/ |archive-date=6 November 2019 |access-date=November 3, 2015 |newspaper=The Washington Post}}</ref> It uses advances in [[natural language processing]] to provide summaries for scholarly papers.<ref name="Hao 18Nov2020">{{Cite web |last=Hao |first=Karen |date=November 18, 2020 |title=An AI helps you summarize the latest in AI |url=https://www.technologyreview.com/2020/11/18/1012259/ai-summarizes-science-papers-ai2-semantic-scholar/ |access-date=2021-02-16 |website=MIT Technology Review |language=en}}</ref> The Semantic Scholar team is actively researching the use of artificial intelligence in [[natural language processing]], [[machine learning]], [[human–computer interaction]], and [[information retrieval]].<ref>{{Cite web|title=Semantic Scholar Research|url=https://research.semanticscholar.org/|access-date=2021-11-22|website=research.semanticscholar.org}}</ref>
'''Semantic Scholar''' is a research tool for scientific literature powered by [[artificial intelligence]]. It is developed at the [[Allen Institute for AI]] and was publicly released in November 2015.<ref name="Eunjung Cha 3Nov2015">{{Cite news |first1=Ariana |last1=Eunjung Cha |date=3 November 2015 |title=Paul Allen's AI research group unveils program that aims to shake up how we search scientific knowledge. Give it a try. |url=https://www.washingtonpost.com/news/to-your-health/wp/2015/11/02/paul-allens-ai-research-group-unveils-program-that-aims-to-shake-up-how-we-search-scientific-knowledge-give-it-a-try/ |url-status=live |archive-url=https://web.archive.org/web/20191106162910/https://www.washingtonpost.com/news/to-your-health/wp/2015/11/02/paul-allens-ai-research-group-unveils-program-that-aims-to-shake-up-how-we-search-scientific-knowledge-give-it-a-try/ |archive-date=6 November 2019 |access-date=November 3, 2015 |newspaper=The Washington Post}}</ref> Semantic Scholar uses modern techniques in [[natural language processing]] to support the research process, for example by providing automatically generated summaries of scholarly papers.<ref name="Hao 18Nov2020">{{Cite web |last=Hao |first=Karen |date=November 18, 2020 |title=An AI helps you summarize the latest in AI |url=https://www.technologyreview.com/2020/11/18/1012259/ai-summarizes-science-papers-ai2-semantic-scholar/ |access-date=2021-02-16 |website=MIT Technology Review |language=en}}</ref> The Semantic Scholar team is actively researching the use of artificial intelligence in [[natural language processing]], [[machine learning]], [[human–computer interaction]], and [[information retrieval]].<ref>{{Cite web|title=Semantic Scholar Research|url=https://research.semanticscholar.org/|access-date=2021-11-22|website=research.semanticscholar.org}}</ref>


Semantic Scholar began as a database for the topics of [[computer science]], [[geoscience]], and [[neuroscience]].<ref name=":0">{{Cite journal |last=Fricke|first=Suzanne|date=2018-01-12|title=Semantic Scholar|url=http://jmla.pitt.edu/ojs/jmla/article/view/280|journal=[[Journal of the Medical Library Association]]|language=en|volume=106|issue=1|pages=145–147|doi=10.5195/jmla.2018.280|s2cid=45802944|issn=1558-9439|doi-access=free}}</ref> In 2017 the system began including [[biomedical literature]] in its corpus.<ref name=":0" /> {{As of|2022|Sep}}, it includes over 200 million publications from all fields of science.<ref>{{cite news |last1=Matthews |first1=David |title=Drowning in the literature? These smart software tools can help |url=https://www.nature.com/articles/d41586-021-02346-4 |access-date=5 September 2022 |work=Nature |date=1 September 2021 |quote=...the publicly available corpus compiled by Semantic Scholar – a tool set up in 2015 by the Allen Institute for Artificial Intelligence in Seattle, Washington – amounting to around 200 million articles, including preprints.}}</ref>
Semantic Scholar began as a database for the topics of [[computer science]], [[geoscience]], and [[neuroscience]].<ref name=":0">{{Cite journal |last=Fricke|first=Suzanne|date=2018-01-12|title=Semantic Scholar|journal=[[Journal of the Medical Library Association]]|language=en|volume=106|issue=1|pages=145–147|doi=10.5195/jmla.2018.280|s2cid=45802944|issn=1558-9439|doi-access=free|pmc=5764585}}</ref> In 2017, the system began including [[biomedical literature]] in its corpus.<ref name=":0" /> {{As of|2022|Sep}}, it includes over 200 million publications from all fields of science.<ref>{{cite news |last1=Matthews |first1=David |title=Drowning in the literature? These smart software tools can help |url=https://www.nature.com/articles/d41586-021-02346-4 |access-date=5 September 2022 |work=Nature |date=1 September 2021 |quote=...the publicly available corpus compiled by Semantic Scholar – a tool set up in 2015 by the Allen Institute for Artificial Intelligence in Seattle, Washington – amounting to around 200 million articles, including preprints.}}</ref>


== Technology ==
== Technology ==
Semantic Scholar provides a one-sentence summary of [[scientific literature]]. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices.<ref name="Grad 24Nov2020">{{Cite news |last=Grad |first=Peter |date=November 24, 2020 |title=AI tool summarizes lengthy papers in a sentence |url=https://techxplore.com/news/2020-11-ai-tool-lengthy-papers-sentence.html |access-date=2021-02-16 |work=Tech Xplore |language=en}}</ref> It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature are ever read.<ref>{{Cite web |date=2019-10-23 |title=Allen Institute's Semantic Scholar now searches across 175 million academic papers |url=https://venturebeat.com/2019/10/23/allen-institutes-semantic-scholar-now-searches-across-175-million-academic-papers/ |access-date=2021-02-16 |website=VentureBeat |language=en-US}}</ref>
Semantic Scholar provides a one-sentence summary of [[scientific literature]]. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices.<ref name="Grad 24Nov2020">{{Cite news |last=Grad |first=Peter |date=November 24, 2020 |title=AI tool summarizes lengthy papers in a sentence |url=https://techxplore.com/news/2020-11-ai-tool-lengthy-papers-sentence.html |access-date=2021-02-16 |work=Tech Xplore |language=en}}</ref> It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature is ever read.<ref>{{Cite web |date=2019-10-23 |title=Allen Institute's Semantic Scholar now searches across 175 million academic papers |url=https://venturebeat.com/2019/10/23/allen-institutes-semantic-scholar-now-searches-across-175-million-academic-papers/ |access-date=2021-02-16 |website=VentureBeat |language=en-US}}</ref>


Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique.<ref name="Hao 18Nov2020"/> The project uses a combination of [[machine learning]], [[natural language processing]], and [[machine vision]] to add a layer of [[semantic analysis (linguistics)|semantic analysis]] to the traditional methods of [[citation analysis]], and to extract relevant figures, [[table extraction|tables]], entities, and venues from papers.<ref name="Bohannon">{{Cite journal |last=Bohannon |first=John |date=11 November 2016 |title=A computer program just ranked the most influential brain scientists of the modern era |url=https://www.science.org/content/article/computer-program-just-ranked-most-influential-brain-scientists-modern-era |url-status=live |journal=[[Science (journal)|Science]] |doi=10.1126/science.aal0371 |archive-url=https://web.archive.org/web/20200429134813/https://www.sciencemag.org/news/2016/11/computer-program-just-ranked-most-influential-brain-scientists-modern-era |archive-date=29 April 2020 |access-date=12 November 2016}}</ref><ref>{{Cite Q | Q108172042 }}</ref>
Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique.<ref name="Hao 18Nov2020"/> The project uses a combination of [[machine learning]], [[natural language processing]], and [[machine vision]] to add a layer of [[semantic analysis (linguistics)|semantic analysis]] to the traditional methods of [[citation analysis]], and to extract relevant figures, [[table extraction|tables]], entities, and venues from papers.<ref name="Bohannon">{{Cite journal |last=Bohannon |first=John |date=11 November 2016 |title=A computer program just ranked the most influential brain scientists of the modern era |url=https://www.science.org/content/article/computer-program-just-ranked-most-influential-brain-scientists-modern-era |url-status=live |journal=[[Science (journal)|Science]] |doi=10.1126/science.aal0371 |archive-url=https://web.archive.org/web/20200429134813/https://www.sciencemag.org/news/2016/11/computer-program-just-ranked-most-influential-brain-scientists-modern-era |archive-date=29 April 2020 |access-date=12 November 2016}}</ref><ref>{{Cite Q | Q108172042 }}</ref>
Line 20: Line 20:
Another key AI-powered feature is Research Feeds, an adaptive research recommender that uses AI to quickly learn what papers users care about reading and recommends the latest research to help scholars stay up to date. It uses a state-of-the-art paper embedding model trained using contrastive learning to find papers similar to those in each Library folder.<ref>{{Cite web |title=Semantic Scholar {{!}} Frequently Asked Questions |url=https://www.semanticscholar.org/faq#what-are-research-feeds |url-status=live|archive-date=July 15, 2023|archive-url=https://web.archive.org/web/20230715223949/https://www.semanticscholar.org/faq#what-are-research-feeds}}</ref>
Another key AI-powered feature is Research Feeds, an adaptive research recommender that uses AI to quickly learn what papers users care about reading and recommends the latest research to help scholars stay up to date. It uses a state-of-the-art paper embedding model trained using contrastive learning to find papers similar to those in each Library folder.<ref>{{Cite web |title=Semantic Scholar {{!}} Frequently Asked Questions |url=https://www.semanticscholar.org/faq#what-are-research-feeds |url-status=live|archive-date=July 15, 2023|archive-url=https://web.archive.org/web/20230715223949/https://www.semanticscholar.org/faq#what-are-research-feeds}}</ref>


Semantic Scholar also offers Semantic Reader, an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual.<ref>{{Cite web |title=Semantic Scholar {{!}} Semantic Reader |url=https://www.semanticscholar.org/product/semantic-reader |url-status=live |website=Semantic Scholar|archive-url=https://web.archive.org/web/20230715224159/https://www.semanticscholar.org/product/semantic-reader|archive-date=July 15, 2023}}</ref> Semantic Reader provides in-line citation cards that allow users to see citations with TLDR summaries as they read and skimming highlights that capture key points of a paper so users can digest faster.
Semantic Scholar also offers Semantic Reader, an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual.<ref>{{Cite web |title=Semantic Scholar {{!}} Semantic Reader |url=https://www.semanticscholar.org/product/semantic-reader |url-status=live |website=Semantic Scholar|archive-url=https://web.archive.org/web/20230715224159/https://www.semanticscholar.org/product/semantic-reader|archive-date=July 15, 2023}}</ref> Semantic Reader provides in-line citation cards that allow users to see citations with [[TL;DR|TLDR]] (short for Too Long, Didn't Read) automatically generated short summaries as they read and skimming highlights that capture key points of a paper so users can digest faster.


In contrast with [[Google Scholar]] and [[PubMed]], Semantic Scholar is designed to highlight the most important and influential elements of a paper.<ref>{{Cite web|url=https://ijlls.org/index.php/ijlls/announcement/view/1|title=Semantic Scholar
In contrast with [[Google Scholar]] and [[PubMed]], Semantic Scholar is designed to highlight the most important and influential elements of a paper.<ref>{{Cite web|url=https://ijlls.org/index.php/ijlls/announcement/view/1|title=Semantic Scholar
|website=International Journal of Language and Literary Studies|access-date=2021-11-09}}</ref> The AI technology is designed to identify hidden connections and links between research topics.<ref>{{Cite book|last=Baykoucheva|first=Svetla|title=Driving Science Information Discovery in the Digital Age|publisher=Chandos Publishing|year=2021|isbn=978-0-12-823724-3|pages=91|language=en}}</ref> Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the [[Microsoft Academic|Microsoft Academic Knowledge Graph]], Springer Nature's [[SciGraph]], and the Semantic Scholar Corpus.<ref>{{Cite book|last1=Jose|first1=Joemon M.|title=Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I|last2=Yilmaz|first2=Emine|last3=Magalhães|first3=João|last4=Castells|first4=Pablo|last5=Ferro|first5=Nicola|last6=Silva|first6=Mário J.|last7=Martins|first7=Flávio|publisher=Springer Nature|year=2020|isbn=978-3-030-45438-8|location=Cham, Switzerland|pages=254|language=en}}</ref>
|website=International Journal of Language and Literary Studies|access-date=2021-11-09}}</ref> The AI technology is designed to identify hidden connections and links between research topics.<ref>{{Cite book|last=Baykoucheva|first=Svetla|title=Driving Science Information Discovery in the Digital Age|publisher=Chandos Publishing|year=2021|isbn=978-0-12-823724-3|page=91|oclc=1241441806|language=en}}</ref> Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the [[Microsoft Academic|Microsoft Academic Knowledge Graph]], Springer Nature's [[SciGraph]], and the Semantic Scholar Corpus (originally a 45 million papers corpus in computer science, neuroscience and biomedicine).<ref>{{Cite book|last1=Jose|first1=Joemon M.|title=Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I|last2=Yilmaz|first2=Emine|last3=Magalhães|first3=João|last4=Castells|first4=Pablo|last5=Ferro|first5=Nicola|last6=Silva|first6=Mário J.|last7=Martins|first7=Flávio|publisher=Springer Nature|year=2020|isbn=978-3-030-45438-8|location=Cham, Switzerland|page=254|language=en|oclc=1164658107}}</ref><ref>{{Cite web |last=Ammar |first=Waleed |date=2019 |title=Open Research Corpus |url=http://labs.semanticscholar.org/corpus/ |url-status=dead |archive-url=https://web.archive.org/web/20190329035308/http://labs.semanticscholar.org/corpus/ |archive-date=2019-03-29 |access-date=2024-08-05 |website=Semantic Scholar Lab Open Research Corpus}}</ref>


== Article identifier {{anchor|S2CID}}==
Each paper hosted by Semantic Scholar is assigned a unique [[identifier]] called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:
Each paper hosted by Semantic Scholar is assigned a unique [[identifier]] called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:
<blockquote>{{Cite journal <!-- Citation bot bypass-->|last1=Liu |first1=Ying |last2=Gayle |first2=Albert A |last3=Wilder-Smith |first3=Annelies |last4=Rocklöv |first4=Joacim |date=March 2020 |title=The reproductive number of COVID-19 is higher compared to SARS coronavirus |journal=Journal of Travel Medicine |volume=27 |issue=2 |pmid=32052846 |doi=10.1093/jtm/taaa021 |s2cid=211099356 }}</blockquote>


== Indexing ==
<blockquote>{{Cite journal <!-- Citation bot bypass-->|last1=Liu |first1=Ying |last2=Gayle |first2=Albert A |last3=Wilder-Smith |first3=Annelies |last4=Rocklöv |first4=Joacim |date=March 2020 |title=The reproductive number of COVID-19 is higher compared to SARS coronavirus |journal=Journal of Travel Medicine |volume=27 |issue=2 |pmid=32052846|doi=10.1093/jtm/taaa021 |s2cid=211099356 }}</blockquote>
Semantic Scholar is free to use and unlike similar search engines (i.e. [[Google Scholar]]) does not search for material that is behind a [[paywall]].<ref name=":0" />{{citation needed|reason=This source does make this claim, but as a throwaway line by a non-expert. Can we find a better source? The claim seems false.|date=March 2023}}

Semantic Scholar is free to use and unlike similar search engines (i.e. [[Google Scholar]]) does not search for material that is behind a [[paywall]].<ref name=":0" />{{cn|reason=This source does make this claim, but as a throwaway line by a non-expert. Can we find a better source? The claim seems false.|date=March 2023}}


One study compared the index scope of Semantic Scholar to Google Scholar, and found that for the papers cited by secondary studies in computer science, the two indices had comparable coverage, each only missing a handful of the papers.<ref name=":1">{{Cite journal|last=Hannousse|first=Abdelhakim|date=2021|title=Searching relevant papers for software engineering secondary studies: Semantic Scholar coverage and identification role|url=https://onlinelibrary.wiley.com/doi/abs/10.1049/sfw2.12011|journal=IET Software|language=en|volume=15|issue=1|pages=126–146|doi=10.1049/sfw2.12011|s2cid=234053002|issn=1751-8814}}</ref>
One study compared the index scope of Semantic Scholar to Google Scholar, and found that for the papers cited by secondary studies in computer science, the two indices had comparable coverage, each only missing a handful of the papers.<ref name=":1">{{Cite journal|last=Hannousse|first=Abdelhakim|date=2021|title=Searching relevant papers for software engineering secondary studies: Semantic Scholar coverage and identification role|url=https://onlinelibrary.wiley.com/doi/abs/10.1049/sfw2.12011|journal=IET Software|language=en|volume=15|issue=1|pages=126–146|doi=10.1049/sfw2.12011|s2cid=234053002|issn=1751-8814}}</ref>


== Number of users and publications ==
== Number of users and publications ==
As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from [[computer science]] and [[biomedicine]].<ref>{{Cite news |date=2017-10-17 |title=AI2 scales up Semantic Scholar search engine to encompass biomedical research |language=en-US |work=GeekWire |url=https://www.geekwire.com/2017/ai2-semantic-scholar-biomedicine/ |url-status=live |access-date=2018-01-18 |archive-url=https://web.archive.org/web/20180119120110/https://www.geekwire.com/2017/ai2-semantic-scholar-biomedicine/ |archive-date=2018-01-19}}</ref> In March 2018, Doug Raymond, who developed [[machine learning]] initiatives for the [[Amazon Alexa]] platform, was hired to lead the Semantic Scholar project.<ref>{{Cite web |date=2018-05-02 |title=Tech Moves: Allen Instititue Hires Amazon Alexa Machine Learning Leader; Microsoft Chairman Takes on New Investor Role; and More |url=https://www.geekwire.com/2018/tech-moves-allen-institute-hires-amazon-alexa-machine-learning-leader-microsoft-chairman-takes-new-investor-role/ |url-status=live |archive-url=https://web.archive.org/web/20180510120907/https://www.geekwire.com/2018/tech-moves-allen-institute-hires-amazon-alexa-machine-learning-leader-microsoft-chairman-takes-new-investor-role/ |archive-date=2018-05-10 |access-date=2018-05-09 |publisher=GeekWire}}</ref> {{As of|2019|Aug}}, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million<ref>{{Cite web |title=Semantic Scholar |url=https://www.semanticscholar.org/ |url-status=live |archive-url=https://web.archive.org/web/20190811212806/https://www.semanticscholar.org/ |archive-date=11 August 2019 |access-date=11 August 2019 |website=Semantic Scholar}}</ref> after the addition of the [[Microsoft Academic Graph]] records.<ref>{{Cite web |date=2018-12-05 |title=AI2 joins forces with Microsoft Research to upgrade search tools for scientific studies |url=https://www.geekwire.com/2018/ai2-joins-forces-microsoft-upgrade-search-tools-scientific-research/ |url-status=live |archive-url=https://web.archive.org/web/20190825181331/https://www.geekwire.com/2018/ai2-joins-forces-microsoft-upgrade-search-tools-scientific-research/ |archive-date=2019-08-25 |access-date=2019-08-25 |website=GeekWire}}</ref> In 2020, a partnership between Semantic Scholar and the [[University of Chicago Press|University of Chicago Press Journals]] made all articles published under the University of Chicago Press available in the Semantic Scholar corpus.<ref>{{Cite web|title=The University of Chicago Press joins more than 500 publishers working with Semantic Scholar to improve search and discoverability|url=https://www.journals.uchicago.edu/journals/pr/201215|access-date=2021-11-22|website=RCNi Company Limited|language=en}}</ref> At the end of 2020, Semantic Scholar had indexed 190 million papers.<ref>{{Cite news|last=Dunn|first=Adriana|date=December 14, 2020|title=Semantic Scholar Adds 25 Million Scientific Papers in 2020 Through New Publisher Partnerships|work=Semantic Scholar|url=https://allenai.org/content/docs/Semantic_Scholar_2020_Publisher_Partners.pdf|access-date=November 22, 2021}}</ref>
As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from [[computer science]] and [[biomedicine]].<ref>{{Cite news |date=2017-10-17 |title=AI2 scales up Semantic Scholar search engine to encompass biomedical research |language=en-US |work=GeekWire |url=https://www.geekwire.com/2017/ai2-semantic-scholar-biomedicine/ |url-status=live |access-date=2018-01-18 |archive-url=https://web.archive.org/web/20180119120110/https://www.geekwire.com/2017/ai2-semantic-scholar-biomedicine/ |archive-date=2018-01-19}}</ref> In March 2018, Doug Raymond, who developed [[machine learning]] initiatives for the [[Amazon Alexa]] platform, was hired to lead the Semantic Scholar project.<ref>{{Cite web |date=2018-05-02 |title=Tech Moves: Allen Instititue Hires Amazon Alexa Machine Learning Leader; Microsoft Chairman Takes on New Investor Role; and More |url=https://www.geekwire.com/2018/tech-moves-allen-institute-hires-amazon-alexa-machine-learning-leader-microsoft-chairman-takes-new-investor-role/ |url-status=live |archive-url=https://web.archive.org/web/20180510120907/https://www.geekwire.com/2018/tech-moves-allen-institute-hires-amazon-alexa-machine-learning-leader-microsoft-chairman-takes-new-investor-role/ |archive-date=2018-05-10 |access-date=2018-05-09 |publisher=GeekWire}}</ref> {{As of|2019|Aug}}, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million<ref>{{Cite web |title=Semantic Scholar |url=https://www.semanticscholar.org/ |url-status=live |archive-url=https://web.archive.org/web/20190811212806/https://www.semanticscholar.org/ |archive-date=11 August 2019 |access-date=11 August 2019 |website=Semantic Scholar}}</ref> after the addition of the [[Microsoft Academic Graph]] records.<ref>{{Cite web |date=2018-12-05 |title=AI2 joins forces with Microsoft Research to upgrade search tools for scientific studies |url=https://www.geekwire.com/2018/ai2-joins-forces-microsoft-upgrade-search-tools-scientific-research/ |url-status=live |archive-url=https://web.archive.org/web/20190825181331/https://www.geekwire.com/2018/ai2-joins-forces-microsoft-upgrade-search-tools-scientific-research/ |archive-date=2019-08-25 |access-date=2019-08-25 |website=GeekWire}}</ref> In 2020, a partnership between Semantic Scholar and the [[University of Chicago Press|University of Chicago Press Journals]] made all articles published under the University of Chicago Press available in the Semantic Scholar corpus.<ref>{{Cite web|title=The University of Chicago Press joins more than 500 publishers working with Semantic Scholar to improve search and discoverability|url=https://www.journals.uchicago.edu/journals/pr/201215|access-date=2021-11-22|website=RCNi Company Limited|language=en}}</ref> At the end of 2020, Semantic Scholar had indexed 190 million papers.<ref>{{Cite news|last=Dunn|first=Adriana|date=December 14, 2020|title=Semantic Scholar Adds 25 Million Scientific Papers in 2020 Through New Publisher Partnerships|work=Semantic Scholar|url=https://allenai.org/content/docs/Semantic_Scholar_2020_Publisher_Partners.pdf|access-date=November 22, 2021}}</ref> In 2020, Semantic Scholar reached seven million users per month.<ref name="Grad 24Nov2020"/>

In 2020, of Semantic Scholar reached seven million users per month.<ref name="Grad 24Nov2020"/>


==See also==
==See also==
Line 56: Line 55:
{{Authority control}}
{{Authority control}}


[[Category:Internet properties established in 2015]]
[[Category:Bibliographic databases in computer science]]
[[Category:Bibliographic databases in computer science]]
[[Category:Scholarly search services]]
[[Category:Scholarly search services]]

Latest revision as of 07:49, 6 October 2024

Semantic Scholar
Type of site
Search engine
Created byAllen Institute for Artificial Intelligence
URLsemanticscholar.org
LaunchedNovember 2, 2015; 9 years ago (2015-11-02)[1]

Semantic Scholar is a research tool for scientific literature powered by artificial intelligence. It is developed at the Allen Institute for AI and was publicly released in November 2015.[2] Semantic Scholar uses modern techniques in natural language processing to support the research process, for example by providing automatically generated summaries of scholarly papers.[3] The Semantic Scholar team is actively researching the use of artificial intelligence in natural language processing, machine learning, human–computer interaction, and information retrieval.[4]

Semantic Scholar began as a database for the topics of computer science, geoscience, and neuroscience.[5] In 2017, the system began including biomedical literature in its corpus.[5] As of September 2022, it includes over 200 million publications from all fields of science.[6]

Technology

[edit]

Semantic Scholar provides a one-sentence summary of scientific literature. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices.[7] It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature is ever read.[8]

Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique.[3] The project uses a combination of machine learning, natural language processing, and machine vision to add a layer of semantic analysis to the traditional methods of citation analysis, and to extract relevant figures, tables, entities, and venues from papers.[9][10]

Another key AI-powered feature is Research Feeds, an adaptive research recommender that uses AI to quickly learn what papers users care about reading and recommends the latest research to help scholars stay up to date. It uses a state-of-the-art paper embedding model trained using contrastive learning to find papers similar to those in each Library folder.[11]

Semantic Scholar also offers Semantic Reader, an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual.[12] Semantic Reader provides in-line citation cards that allow users to see citations with TLDR (short for Too Long, Didn't Read) automatically generated short summaries as they read and skimming highlights that capture key points of a paper so users can digest faster.

In contrast with Google Scholar and PubMed, Semantic Scholar is designed to highlight the most important and influential elements of a paper.[13] The AI technology is designed to identify hidden connections and links between research topics.[14] Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the Microsoft Academic Knowledge Graph, Springer Nature's SciGraph, and the Semantic Scholar Corpus (originally a 45 million papers corpus in computer science, neuroscience and biomedicine).[15][16]

Article identifier

[edit]

Each paper hosted by Semantic Scholar is assigned a unique identifier called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:

Liu, Ying; Gayle, Albert A; Wilder-Smith, Annelies; Rocklöv, Joacim (March 2020). "The reproductive number of COVID-19 is higher compared to SARS coronavirus". Journal of Travel Medicine. 27 (2). doi:10.1093/jtm/taaa021. PMID 32052846. S2CID 211099356.

Indexing

[edit]

Semantic Scholar is free to use and unlike similar search engines (i.e. Google Scholar) does not search for material that is behind a paywall.[5][citation needed]

One study compared the index scope of Semantic Scholar to Google Scholar, and found that for the papers cited by secondary studies in computer science, the two indices had comparable coverage, each only missing a handful of the papers.[17]

Number of users and publications

[edit]

As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from computer science and biomedicine.[18] In March 2018, Doug Raymond, who developed machine learning initiatives for the Amazon Alexa platform, was hired to lead the Semantic Scholar project.[19] As of August 2019, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million[20] after the addition of the Microsoft Academic Graph records.[21] In 2020, a partnership between Semantic Scholar and the University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus.[22] At the end of 2020, Semantic Scholar had indexed 190 million papers.[23] In 2020, Semantic Scholar reached seven million users per month.[7]

See also

[edit]

References

[edit]
  1. ^ Jones, Nicola (2015). "Artificial-intelligence institute launches free science search engine". Nature. doi:10.1038/nature.2015.18703. ISSN 1476-4687. S2CID 182440976.
  2. ^ Eunjung Cha, Ariana (3 November 2015). "Paul Allen's AI research group unveils program that aims to shake up how we search scientific knowledge. Give it a try". The Washington Post. Archived from the original on 6 November 2019. Retrieved November 3, 2015.
  3. ^ a b Hao, Karen (November 18, 2020). "An AI helps you summarize the latest in AI". MIT Technology Review. Retrieved 2021-02-16.
  4. ^ "Semantic Scholar Research". research.semanticscholar.org. Retrieved 2021-11-22.
  5. ^ a b c Fricke, Suzanne (2018-01-12). "Semantic Scholar". Journal of the Medical Library Association. 106 (1): 145–147. doi:10.5195/jmla.2018.280. ISSN 1558-9439. PMC 5764585. S2CID 45802944.
  6. ^ Matthews, David (1 September 2021). "Drowning in the literature? These smart software tools can help". Nature. Retrieved 5 September 2022. ...the publicly available corpus compiled by Semantic Scholar – a tool set up in 2015 by the Allen Institute for Artificial Intelligence in Seattle, Washington – amounting to around 200 million articles, including preprints.
  7. ^ a b Grad, Peter (November 24, 2020). "AI tool summarizes lengthy papers in a sentence". Tech Xplore. Retrieved 2021-02-16.
  8. ^ "Allen Institute's Semantic Scholar now searches across 175 million academic papers". VentureBeat. 2019-10-23. Retrieved 2021-02-16.
  9. ^ Bohannon, John (11 November 2016). "A computer program just ranked the most influential brain scientists of the modern era". Science. doi:10.1126/science.aal0371. Archived from the original on 29 April 2020. Retrieved 12 November 2016.
  10. ^ Christopher Clark; Santosh Divvala (2016), PDFFigures 2.0: Mining figures from research papers, Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries - JCDL '16, Wikidata Q108172042
  11. ^ "Semantic Scholar | Frequently Asked Questions". Archived from the original on July 15, 2023.
  12. ^ "Semantic Scholar | Semantic Reader". Semantic Scholar. Archived from the original on July 15, 2023.
  13. ^ "Semantic Scholar". International Journal of Language and Literary Studies. Retrieved 2021-11-09.
  14. ^ Baykoucheva, Svetla (2021). Driving Science Information Discovery in the Digital Age. Chandos Publishing. p. 91. ISBN 978-0-12-823724-3. OCLC 1241441806.
  15. ^ Jose, Joemon M.; Yilmaz, Emine; Magalhães, João; Castells, Pablo; Ferro, Nicola; Silva, Mário J.; Martins, Flávio (2020). Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Cham, Switzerland: Springer Nature. p. 254. ISBN 978-3-030-45438-8. OCLC 1164658107.
  16. ^ Ammar, Waleed (2019). "Open Research Corpus". Semantic Scholar Lab Open Research Corpus. Archived from the original on 2019-03-29. Retrieved 2024-08-05.
  17. ^ Hannousse, Abdelhakim (2021). "Searching relevant papers for software engineering secondary studies: Semantic Scholar coverage and identification role". IET Software. 15 (1): 126–146. doi:10.1049/sfw2.12011. ISSN 1751-8814. S2CID 234053002.
  18. ^ "AI2 scales up Semantic Scholar search engine to encompass biomedical research". GeekWire. 2017-10-17. Archived from the original on 2018-01-19. Retrieved 2018-01-18.
  19. ^ "Tech Moves: Allen Instititue Hires Amazon Alexa Machine Learning Leader; Microsoft Chairman Takes on New Investor Role; and More". GeekWire. 2018-05-02. Archived from the original on 2018-05-10. Retrieved 2018-05-09.
  20. ^ "Semantic Scholar". Semantic Scholar. Archived from the original on 11 August 2019. Retrieved 11 August 2019.
  21. ^ "AI2 joins forces with Microsoft Research to upgrade search tools for scientific studies". GeekWire. 2018-12-05. Archived from the original on 2019-08-25. Retrieved 2019-08-25.
  22. ^ "The University of Chicago Press joins more than 500 publishers working with Semantic Scholar to improve search and discoverability". RCNi Company Limited. Retrieved 2021-11-22.
  23. ^ Dunn, Adriana (December 14, 2020). "Semantic Scholar Adds 25 Million Scientific Papers in 2020 Through New Publisher Partnerships" (PDF). Semantic Scholar. Retrieved November 22, 2021.
[edit]