Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: question Clear Filter

Evaluating Long-Context Question and Answer Systems

While evaluating Q&A systems is straightforward with short paragraphs, complexity increases as documents grow larger. For example, technical documentation, novels and movies, as well as multi-document scenarios. Although some of these evaluation challenges also appear in shorter contexts, long-context evaluation amplifies issues such as: Information overload: Irrelevant details in large documents obscure relevant facts, making it harder for retrievers and models to locate the right evidence for

Show HN: I Built AskMedically – Get Research-Backed Answers to Medical Queries

Hi HN, I’ve built AskMedically – an AI-powered assistant that answers health and medical questions using real research papers from trusted medical sources like PubMed, Cochrane, etc. Whether you’re a healthcare enthusiast, patient, student, or professional – AskMedically helps you explore trusted medical knowledge without needing a medical degree or slogging through dozens of PDFs. Examples: • “Does intermittent fasting improve insulin sensitivity?” • “What are the benefits of creatine for

A Chinese firm has just launched a constantly changing set of AI benchmarks

Development of the benchmark at HongShan began in 2022, following ChatGPT’s breakout success, as an internal tool for assessing which models are worth investing in. Since then, led by partner Gong Yuan, the team has steadily expanded the system, bringing in outside researchers and professionals to help refine it. As the project grew more sophisticated, they decided to release it to the public. Xbench approached the problem with two different systems. One is similar to traditional benchmarking:

Think of a Number

My feed was recently clogged up with news articles reporting that Sam Altman thinks that AGI is here, or will be here next year, or whatever. I will refrain from giving even more air to this nonsense by linking to the stories. This kind of irresponsible hype-generation drives me nuts (although it also drives up stock prices so I can see why the tech bros are motivated to do it). Sure AI can have a good crack at undergraduate mathematics right now, and sure that’s pretty amazing. But our universi

No Hello

Note that Keith could have got his answer minutes sooner, and needn't have kept Tim waiting. In fact, Tim could have started thinking about the question right away! People who do this are generally trying to be polite by not jumping right into the request, like one would in person or on the phone - and that's great! But it's 2022 and chat is neither of those things. For most people, typing is much slower than talking. So despite best intentions, you're actually just making the other person wait

Chemical knowledge and reasoning of large language models vs. chemist expertise

Benchmark corpus To compile our benchmark corpus, we utilized a broad list of sources (Methods), ranging from completely novel, manually crafted questions over university exams to semi-automatically generated questions based on curated subsets of data in chemical databases. For quality assurance, all questions have been reviewed by at least two scientists in addition to the original curator and automated checks. Importantly, our large pool of questions encompasses a wide range of topics and que