Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: evaluation Clear Filter

LangChain’s Align Evals closes the evaluator trust gap with prompt-level calibration

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now As enterprises increasingly turn to AI models to ensure their applications function well and are reliable, the gaps between model-led evaluations and human evaluations have only become clearer. To combat this, LangChain added Align Evals to LangSmith, a way to bridge the gap between large language model-based evaluators and human preferenc

Open-source MCPEval makes protocol-level agent testing plug-and-play

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Enterprises are beginning to adopt the Model Context Protocol (MCP) primarily to facilitate the identification and guidance of agent tool use. However, researchers from Salesforce discovered another way to utilize MCP technology, this time to aid in evaluating AI agents themselves. The researchers unveiled MCPEval, a new method and open-so

LSM-2: Learning from incomplete wearable sensor data

Training and evaluation We leverage a dataset with 40 million hours of wearable data sampled from over 60,000 participants during the period from March to May 2024. The dataset was thoroughly anonymized or de-identified to ensure that participant information was removed and privacy was maintained. Subjects wore a variety of Fitbit and Google Pixel smartwatches and trackers and consented for their data to be used for research and development of new health and wellness products and services. The

Launch HN: Confident AI (YC W25) – Open-source evaluation framework for LLM apps

Hi HN - we're Jeffrey and Kritin, and we're building Confident AI ( https://confident-ai.com ). This is the cloud platform for DeepEval ( https://github.com/confident-ai/deepeval ), our open-source package that helps engineers evaluate and unit-test LLM applications. Think Pytest for LLMs. We spent the past year building DeepEval with the goal of providing the best LLM evaluation developer experience, growing it to run over 600K evaluations daily in CI/CD pipelines of enterprises like BCG, Astr