Programming Language Performance Benchmark

Intel Beats Apple M5 in Benchmarks, But There’s a Catch

Intel's Core Ultra X9 388H has beaten Apple's M5 in multi-core benchmarks, but the lead may last only days before Apple's M5 ...

GitHub

Qwen3 RAG Performance Benchmark

This project provides a comprehensive benchmarking framework for evaluating RAG (Retrieval-Augmented Generation) performance using Qwen3 embedding and reranker models of different sizes (0.6B, 4B, 8B) ...

WinBuzzer

AI Coding: Microsoft’s 7B X-Coder Outperforms 14B Rivals on Synthetic Data

Microsoft and Tsinghua University have developed a 7B-parameter AI coding model that outperforms 14B rivals using only ...

GitHub

SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents

Pre-built Docker Images Support - We merged PR #8 which enables instant use of pre-built Docker images, significantly reducing setup time and improving the evaluation ...

InfoQ

MIT's Recursive Language Models Improve Performance on Long-Context Tasks

Researchers at MIT's CSAIL published a design for Recursive Language Models (RLM), a technique for improving LLM performance on long-context tasks. RLMs use a programming environment to recursively ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results