People Cannot Be Alone Keeping Scientific Information

There is an old man joke that physicists want to tell: Everything is already known and reported in a Russian journal in the 1960s, we don’t know about it. Even if hyperbolic, the joke accurately captures the current state of activity. The amount of expertise is wide and growing rapidly: The number of science articles posted on arXiv (the largest and most popular preprint server) in 2021 is expected amounting to 190,000—And that’s just one piece of the scientific literature produced this year.

Clearly we do not know what we know, because no one can read the whole literature even in their own narrow field (which includes, in addition to journal articles, theses of PhD, lab notes, slides, white paper, technical note, and reports). Indeed, it is possible that in this mountain of papers, the answers to many questions are not hidden, important insights are ignored or forgotten, and connections remain hidden.

Artificial intelligence is a potential solution. Algorithms can now analyze text without human intervention to find relationships between words that help detection. knowledge. But much more can be done if we move away from writing the usual science articles whose style and structure have almost never changed over the last hundred years.

Text mining has a number of limitations, including access to the full text of papers and legal concern. But most importantly, it’s not really AI understand the concepts and the relationships between them, and sensitivity to biases in the data set, such as the choice of papers it examines. It is difficult for AI – and, indeed, even for the unreadable human reader – to understand the roles on the scientific side because the use of jargon varies from one discipline to another and the same term can be used. which have different meanings in different fields. Increased research relevance means that it is always difficult to define a topic accurately using a combination of keywords to identify all relevant papers. Making connections and (again) discovering the same concepts is difficult even for the brightest mind.

As long as this is the case, AI is unreliable and people have to check everything that AI releases after text-mining, a tedious task that goes against the very purpose of using AI. To solve this problem we need to make science papers not only machine readable but machine-understand, by (also writing) them in a special variant of the program language. That is: Teach the science of language machines that they understand.

Writing scientific knowledge in a common programming language can be dry, but it will be sustainable, as new concepts will be added directly to the machine-understood science library. Moreover, because machines are taught many scientific facts, they can help scientists refine their logical arguments; spot errors, inconsistencies, plagiarism, and duplications; and highlight connections. AI with an understanding of physical laws more powerful than data -only -trained AI, so machines with scientific intelligence can help future discoveries. Machines with a lot of scientific knowledge can be more helpful than replacing human scientists.

The mathematicians have already begun this process of translation. They teach computer math by writing theorems and proving in languages ​​like Lean. Lean is a proven assistant and programming language where one can introduce mathematical concepts in the form of objects. Using known facts, Lean can reason whether a statement is true or false, thus helping mathematicians prove the evidence and identify areas where their logic is not weighted enough. . The more Lean math is known, the more it can do. the Xena Project at Imperial College London aims to enter the full Lean undergraduate mathematics curriculum. One day, assistants will be able to help mathematicians do research by examining their reasoning and finding out the much mathematical knowledge they have.

Source link


Leave a Reply

Your email address will not be published. Required fields are marked *