-
Next Generation Internet Initiative An opportunity to fix the internet
FOSDEM 2018 Hacking conference
#hacking, #hackers, #infosec, #opsec, #IT, #security
published: 17 Jun 2022
-
Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation (CIDR 2021)
Authors: Laurel Orr (Stanford University); Megan Leszczynski (Stanford University); Neel Guha (Stanford University); Sen Wu (Stanford University); Simran Arora (Stanford University); Xiao Ling (); Christopher Re (Stanford University)
Paper: http://cidrdb.org/cidr2021/papers/cidr2021_paper13.pdf
published: 12 Jan 2021
-
[NGI Forum 2019] SOLUTIONS FOR THE FUTURE. Martin Wezowski
Futurist Martin Wezowski looks at what happens when machine intelligence is matched with human ingenuity and gives us a tantalizing glimpse of a bright future that includes everybody.
---
The Next Generation Internet (NGI) initiative aims to foster a vibrant open Human-centric Internet movement that links research, policy and society for the creation of a better Internet – an Internet that respects fundamental values of privacy, participation and diversity.
The NGI Forum is the flagship annual event that gathers together prominent researchers, innovators and policy makers at work on several fronts to restructure the Internet to be fit for the future we want, while we continue using it to help run our societies and economies. This year it will take place on the 25 September in Helsinki (...
published: 09 Oct 2019
-
Deduplication and Author Disambiguation of Streaming Records via Supervised Models -Reza Karimi
"Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by - Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier's Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. - We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss ...
published: 30 Oct 2017
-
Bootleg: Guidable Self-Supervision for Named Entity Disambiguation -- Chris Re (Stanford University)
September 18, 2020
Abstract
Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguation (NED). A key challenge in NED is generalizing to rarely seen (tail) entities. Traditionally NED uses hand-tuned patterns from a knowledge base to capture rare, but reliable, signals. Hand-built features make it challenging to deploy and maintain NED–especially in multiple locales. While at Apple in 2018, we built a self-supervised system for NED that was deployed in a handful of locales and that improved performance of downstream models significantly. However, due to the fog of production, it was unclear what aspects of these models were most valuable. Motivated by this experience, we built Bootleg, a clean-slate, open-source, s...
published: 12 Jan 2023
-
Question Generation using Natural Language Processing in EdTech
Timestamp:
00:00 Introduction
01:59 Pipeline
03:29 Google BERT for Extractive Test Summarisation
04:41 Python Keyphrase Extraction
05:46 Sentence Mapping with FlashText
06:44 Generating Distractors
07:27 WordNet
08:01 Snipped from Collab - WordNet
08:36 Word Sense Disambiguation
09:40 ConceptNet
11:12 Sense2Vec
12:30 Short Demo with Streamlit
La Kopi @ Developers Space is a monthly open mic night for the developer community to learn, connect, and be inspired by each other. Every month, a tech theme is selected and developers submit their topics to be shared with the community.
Name: Hardik Nahata (AI Engineer, Aspecto Technologies)
Topic: Question Generation using Natural Language Processing in EdTech
Learn how to leverage Natural Language Processing to generate multiple choice quest...
published: 21 Jul 2021
-
Automated Author Disambiguation
A web application was developed as the result of a Case Study Project based on the article "Bastrakova E., Ledesma R., & Millan J. (2016). Author Disambiguation. University Lumiere Lyon 2"
Visit our GitHub repository for more information: https://github.com/DMKM1517/author_disambiguation
published: 28 Jun 2016
-
How the Coming Population Collapse Will Change Society Forever
Get a 14-day free trial with my sponsor Aura: https://aura.com/moon
YouTube with Moon: https://www.skool.com/moon-society-5881/about
Support the channel here (all money goes straight back into the channel):
► Become a Patron: https://www.patreon.com/MoonReal
► Follow my Twitter: https://twitter.com/MoonRealYT
published: 06 Jun 2024
-
NGI Webinar Data Methods
A webinar held by NGI Forward project on data methods.
published: 23 Apr 2019
55:54
Next Generation Internet Initiative An opportunity to fix the internet
FOSDEM 2018 Hacking conference
#hacking, #hackers, #infosec, #opsec, #IT, #security
FOSDEM 2018 Hacking conference
#hacking, #hackers, #infosec, #opsec, #IT, #security
https://wn.com/Next_Generation_Internet_Initiative_An_Opportunity_To_Fix_The_Internet
FOSDEM 2018 Hacking conference
#hacking, #hackers, #infosec, #opsec, #IT, #security
- published: 17 Jun 2022
- views: 1
11:26
Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation (CIDR 2021)
Authors: Laurel Orr (Stanford University); Megan Leszczynski (Stanford University); Neel Guha (Stanford University); Sen Wu (Stanford University); Simran Arora ...
Authors: Laurel Orr (Stanford University); Megan Leszczynski (Stanford University); Neel Guha (Stanford University); Sen Wu (Stanford University); Simran Arora (Stanford University); Xiao Ling (); Christopher Re (Stanford University)
Paper: http://cidrdb.org/cidr2021/papers/cidr2021_paper13.pdf
https://wn.com/Bootleg_Chasing_The_Tail_With_Self_Supervised_Named_Entity_Disambiguation_(Cidr_2021)
Authors: Laurel Orr (Stanford University); Megan Leszczynski (Stanford University); Neel Guha (Stanford University); Sen Wu (Stanford University); Simran Arora (Stanford University); Xiao Ling (); Christopher Re (Stanford University)
Paper: http://cidrdb.org/cidr2021/papers/cidr2021_paper13.pdf
- published: 12 Jan 2021
- views: 230
7:20
[NGI Forum 2019] SOLUTIONS FOR THE FUTURE. Martin Wezowski
Futurist Martin Wezowski looks at what happens when machine intelligence is matched with human ingenuity and gives us a tantalizing glimpse of a bright future t...
Futurist Martin Wezowski looks at what happens when machine intelligence is matched with human ingenuity and gives us a tantalizing glimpse of a bright future that includes everybody.
---
The Next Generation Internet (NGI) initiative aims to foster a vibrant open Human-centric Internet movement that links research, policy and society for the creation of a better Internet – an Internet that respects fundamental values of privacy, participation and diversity.
The NGI Forum is the flagship annual event that gathers together prominent researchers, innovators and policy makers at work on several fronts to restructure the Internet to be fit for the future we want, while we continue using it to help run our societies and economies. This year it will take place on the 25 September in Helsinki (Finland) in co-location with the My Data 2019 Conference.
ngiforum.eu
ngi.eu
https://wn.com/Ngi_Forum_2019_Solutions_For_The_Future._Martin_Wezowski
Futurist Martin Wezowski looks at what happens when machine intelligence is matched with human ingenuity and gives us a tantalizing glimpse of a bright future that includes everybody.
---
The Next Generation Internet (NGI) initiative aims to foster a vibrant open Human-centric Internet movement that links research, policy and society for the creation of a better Internet – an Internet that respects fundamental values of privacy, participation and diversity.
The NGI Forum is the flagship annual event that gathers together prominent researchers, innovators and policy makers at work on several fronts to restructure the Internet to be fit for the future we want, while we continue using it to help run our societies and economies. This year it will take place on the 25 September in Helsinki (Finland) in co-location with the My Data 2019 Conference.
ngiforum.eu
ngi.eu
- published: 09 Oct 2019
- views: 130
30:45
Deduplication and Author Disambiguation of Streaming Records via Supervised Models -Reza Karimi
"Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by - Application o...
"Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by - Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier's Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. - We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. - Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.
Session hashtag: #EUai2"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
https://wn.com/Deduplication_And_Author_Disambiguation_Of_Streaming_Records_Via_Supervised_Models_Reza_Karimi
"Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by - Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier's Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. - We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. - Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.
Session hashtag: #EUai2"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
- published: 30 Oct 2017
- views: 812
56:30
Bootleg: Guidable Self-Supervision for Named Entity Disambiguation -- Chris Re (Stanford University)
September 18, 2020
Abstract
Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguat...
September 18, 2020
Abstract
Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguation (NED). A key challenge in NED is generalizing to rarely seen (tail) entities. Traditionally NED uses hand-tuned patterns from a knowledge base to capture rare, but reliable, signals. Hand-built features make it challenging to deploy and maintain NED–especially in multiple locales. While at Apple in 2018, we built a self-supervised system for NED that was deployed in a handful of locales and that improved performance of downstream models significantly. However, due to the fog of production, it was unclear what aspects of these models were most valuable. Motivated by this experience, we built Bootleg, a clean-slate, open-source, self-supervised system to improve tail performance using a simple transformer-based architecture. Bootleg improves tail generalization through a new inverse regularization scheme to favor more generalizable signals automatically. Bootleg-like models are used by several downstream applications. As a result, quality issues fixed in one application may need to be fixed independently in many applications. Thus, we initiate the study of techniques to fix systematic errors in self-supervised models using weak supervision, augmentation, and training set refinement. Bootleg achieves new state-of-the-art performance on the three major NED benchmarks by up to 3.3 F1 points, and it improves performance over BERT baselines on tail slices by 50.1 F1 points.
Bootleg is open source at http://hazyresearch.stanford.edu/bootleg/.
Biography
Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with products from technology and enterprise companies. He has cofounded four companies based on his research into machine learning systems,SambaNova and Snorkel, along with two companies that are now part of Apple, Lattice (DeepDive) in 2017 and Inductiv (HoloClean) in 2020.
He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016.
https://wn.com/Bootleg_Guidable_Self_Supervision_For_Named_Entity_Disambiguation_Chris_Re_(Stanford_University)
September 18, 2020
Abstract
Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguation (NED). A key challenge in NED is generalizing to rarely seen (tail) entities. Traditionally NED uses hand-tuned patterns from a knowledge base to capture rare, but reliable, signals. Hand-built features make it challenging to deploy and maintain NED–especially in multiple locales. While at Apple in 2018, we built a self-supervised system for NED that was deployed in a handful of locales and that improved performance of downstream models significantly. However, due to the fog of production, it was unclear what aspects of these models were most valuable. Motivated by this experience, we built Bootleg, a clean-slate, open-source, self-supervised system to improve tail performance using a simple transformer-based architecture. Bootleg improves tail generalization through a new inverse regularization scheme to favor more generalizable signals automatically. Bootleg-like models are used by several downstream applications. As a result, quality issues fixed in one application may need to be fixed independently in many applications. Thus, we initiate the study of techniques to fix systematic errors in self-supervised models using weak supervision, augmentation, and training set refinement. Bootleg achieves new state-of-the-art performance on the three major NED benchmarks by up to 3.3 F1 points, and it improves performance over BERT baselines on tail slices by 50.1 F1 points.
Bootleg is open source at http://hazyresearch.stanford.edu/bootleg/.
Biography
Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with products from technology and enterprise companies. He has cofounded four companies based on his research into machine learning systems,SambaNova and Snorkel, along with two companies that are now part of Apple, Lattice (DeepDive) in 2017 and Inductiv (HoloClean) in 2020.
He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016.
- published: 12 Jan 2023
- views: 135
14:33
Question Generation using Natural Language Processing in EdTech
Timestamp:
00:00 Introduction
01:59 Pipeline
03:29 Google BERT for Extractive Test Summarisation
04:41 Python Keyphrase Extraction
05:46 Sentence Mapping with ...
Timestamp:
00:00 Introduction
01:59 Pipeline
03:29 Google BERT for Extractive Test Summarisation
04:41 Python Keyphrase Extraction
05:46 Sentence Mapping with FlashText
06:44 Generating Distractors
07:27 WordNet
08:01 Snipped from Collab - WordNet
08:36 Word Sense Disambiguation
09:40 ConceptNet
11:12 Sense2Vec
12:30 Short Demo with Streamlit
La Kopi @ Developers Space is a monthly open mic night for the developer community to learn, connect, and be inspired by each other. Every month, a tech theme is selected and developers submit their topics to be shared with the community.
Name: Hardik Nahata (AI Engineer, Aspecto Technologies)
Topic: Question Generation using Natural Language Processing in EdTech
Learn how to leverage Natural Language Processing to generate multiple choice questions from any given text. We will look into tools such as Google's BERT models, Python Keyphrase Extraction, and HuggingFace.
Key takeaways of this session includes: generating choices or distractors in multiple choice questions, and identifying the context of a word using word sense disambiguation. There will also be a live demo run of the solution.
Be our next La Kopi Speaker → https://goo.gle/openmic
For more updates on upcoming events, follow us on social media:
✉️ Newsletter → https://goo.gle/devspace-news
👤 Facebook → https://www.facebook.com/DevSpaceSG/
🐦 Twitter → https://twitter.com/DevSpaceSG
🔴 Meetup →https://www.meetup.com/developer-space/
https://wn.com/Question_Generation_Using_Natural_Language_Processing_In_Edtech
Timestamp:
00:00 Introduction
01:59 Pipeline
03:29 Google BERT for Extractive Test Summarisation
04:41 Python Keyphrase Extraction
05:46 Sentence Mapping with FlashText
06:44 Generating Distractors
07:27 WordNet
08:01 Snipped from Collab - WordNet
08:36 Word Sense Disambiguation
09:40 ConceptNet
11:12 Sense2Vec
12:30 Short Demo with Streamlit
La Kopi @ Developers Space is a monthly open mic night for the developer community to learn, connect, and be inspired by each other. Every month, a tech theme is selected and developers submit their topics to be shared with the community.
Name: Hardik Nahata (AI Engineer, Aspecto Technologies)
Topic: Question Generation using Natural Language Processing in EdTech
Learn how to leverage Natural Language Processing to generate multiple choice questions from any given text. We will look into tools such as Google's BERT models, Python Keyphrase Extraction, and HuggingFace.
Key takeaways of this session includes: generating choices or distractors in multiple choice questions, and identifying the context of a word using word sense disambiguation. There will also be a live demo run of the solution.
Be our next La Kopi Speaker → https://goo.gle/openmic
For more updates on upcoming events, follow us on social media:
✉️ Newsletter → https://goo.gle/devspace-news
👤 Facebook → https://www.facebook.com/DevSpaceSG/
🐦 Twitter → https://twitter.com/DevSpaceSG
🔴 Meetup →https://www.meetup.com/developer-space/
- published: 21 Jul 2021
- views: 5645
2:43
Automated Author Disambiguation
A web application was developed as the result of a Case Study Project based on the article "Bastrakova E., Ledesma R., & Millan J. (2016). Author Disambiguatio...
A web application was developed as the result of a Case Study Project based on the article "Bastrakova E., Ledesma R., & Millan J. (2016). Author Disambiguation. University Lumiere Lyon 2"
Visit our GitHub repository for more information: https://github.com/DMKM1517/author_disambiguation
https://wn.com/Automated_Author_Disambiguation
A web application was developed as the result of a Case Study Project based on the article "Bastrakova E., Ledesma R., & Millan J. (2016). Author Disambiguation. University Lumiere Lyon 2"
Visit our GitHub repository for more information: https://github.com/DMKM1517/author_disambiguation
- published: 28 Jun 2016
- views: 209
28:24
How the Coming Population Collapse Will Change Society Forever
Get a 14-day free trial with my sponsor Aura: https://aura.com/moon
YouTube with Moon: https://www.skool.com/moon-society-5881/about
Support the channel here ...
Get a 14-day free trial with my sponsor Aura: https://aura.com/moon
YouTube with Moon: https://www.skool.com/moon-society-5881/about
Support the channel here (all money goes straight back into the channel):
► Become a Patron: https://www.patreon.com/MoonReal
► Follow my Twitter: https://twitter.com/MoonRealYT
https://wn.com/How_The_Coming_Population_Collapse_Will_Change_Society_Forever
Get a 14-day free trial with my sponsor Aura: https://aura.com/moon
YouTube with Moon: https://www.skool.com/moon-society-5881/about
Support the channel here (all money goes straight back into the channel):
► Become a Patron: https://www.patreon.com/MoonReal
► Follow my Twitter: https://twitter.com/MoonRealYT
- published: 06 Jun 2024
- views: 865762
1:00:03
NGI Webinar Data Methods
A webinar held by NGI Forward project on data methods.
A webinar held by NGI Forward project on data methods.
https://wn.com/Ngi_Webinar_Data_Methods
A webinar held by NGI Forward project on data methods.
- published: 23 Apr 2019
- views: 38