GoKawiil - Latest Tech News & Aggregated Headlines

Evaluating Long-Context Question and Answer Systems

news.ycombinator.com Eugene Yan 2026-02-12 17:47:27

While evaluating Q&A systems is straightforward with short paragraphs, complexity increases as documents grow larger. For example, technical documentation, novels and movies, as well as multi-document scenarios. Although some of these evaluation challenges also appear in shorter contexts, long-context evaluation amplifies issues such as: Information overload: Irrelevant details in large documents obscure relevant facts, making it harder for retrievers and models to locate the right evidence for

Topics: answer answers information models questions

Shop Amazon

Show HN: I Built AskMedically – Get Research-Backed Answers to Medical Queries

news.ycombinator.com Unknown 2026-02-24 22:43:57

Hi HN, I’ve built AskMedically – an AI-powered assistant that answers health and medical questions using real research papers from trusted medical sources like PubMed, Cochrane, etc. Whether you’re a healthcare enthusiast, patient, student, or professional – AskMedically helps you explore trusted medical knowledge without needing a medical degree or slogging through dozens of PDFs. Examples: • “Does intermittent fasting improve insulin sensitivity?” • “What are the benefits of creatine for

Topics: askmedically explore health medical questions

Shop Amazon

A Chinese firm has just launched a constantly changing set of AI benchmarks

technologyreview.com Caiwei Chen 2026-03-01 20:46:28

Development of the benchmark at HongShan began in 2022, following ChatGPT’s breakout success, as an internal tool for assessing which models are worth investing in. Since then, led by partner Gong Yuan, the team has steadily expanded the system, bringing in outside researchers and professionals to help refine it. As the project grew more sophisticated, they decided to release it to the public. Xbench approached the problem with two different systems. One is similar to traditional benchmarking:

Topics: like model models questions xbench

Shop Amazon

Think of a Number

news.ycombinator.com View All Posts Xenaproject 2026-03-06 08:34:44

My feed was recently clogged up with news articles reporting that Sam Altman thinks that AGI is here, or will be here next year, or whatever. I will refrain from giving even more air to this nonsense by linking to the stories. This kind of irresponsible hype-generation drives me nuts (although it also drives up stock prices so I can see why the tech bros are motivated to do it). Sure AI can have a good crack at undergraduate mathematics right now, and sure that’s pretty amazing. But our universi

Topics: level mathematics need questions undergraduate

Shop Amazon

No Hello

news.ycombinator.com Unknown 2026-03-12 22:09:22

Note that Keith could have got his answer minutes sooner, and needn't have kept Tim waiting. In fact, Tim could have started thinking about the question right away! People who do this are generally trying to be polite by not jumping right into the request, like one would in person or on the phone - and that's great! But it's 2022 and chat is neither of those things. For most people, typing is much slower than talking. So despite best intentions, you're actually just making the other person wait

Topics: got people person question right

Shop Amazon

Chemical knowledge and reasoning of large language models vs. chemist expertise

news.ycombinator.com Mirza 2026-03-13 11:22:06

Benchmark corpus To compile our benchmark corpus, we utilized a broad list of sources (Methods), ranging from completely novel, manually crafted questions over university exams to semi-automatically generated questions based on curated subsets of data in chemical databases. For quality assurance, all questions have been reviewed by at least two scientists in addition to the original curator and automated checks. Importantly, our large pool of questions encompasses a wide range of topics and que

Topics: chembench fig models performance questions

Shop Amazon

Latest Tech News

Evaluating Long-Context Question and Answer Systems

Show HN: I Built AskMedically – Get Research-Backed Answers to Medical Queries

A Chinese firm has just launched a constantly changing set of AI benchmarks

Think of a Number

No Hello

Chemical knowledge and reasoning of large language models vs. chemist expertise

About GoKawiil

Privacy

Advertising

Latest Tech News

Evaluating Long-Context Question and Answer Systems

Show HN: I Built AskMedically – Get Research-Backed Answers to Medical Queries

A Chinese firm has just launched a constantly changing set of AI benchmarks

Think of a Number

No Hello

Chemical knowledge and reasoning of large language models vs. chemist expertise

Trending Topics

Hot Now

Popular

Emerging

About GoKawiil

Privacy

Advertising