RMCP: R Model Context Protocol Server Version 0.3.7 - A comprehensive Model Context Protocol (MCP) server with 40 statistical analysis tools across 9 categories. RMCP enables AI assistants and applications to perform sophisticated statistical modeling, econometric analysis, machine learning, time series analysis, and data science tasks seamlessly through natural conversation. πŸŽ‰ Now with 40 statistical tools across 9 categories including natural language formula building and intelligent error recovery! πŸš€ Quick Start pip install rmcp # Check R packages are installed rmcp check-r-packages # Start the MCP server rmcp start That's it! RMCP is now ready to handle statistical analysis requests via the Model Context Protocol. πŸ‘‰ See Working Examples β†’ - Copy-paste ready commands with real datasets! ✨ Features πŸ“Š Comprehensive Statistical Analysis (40 Tools) Regression & Correlation βœ… Linear Regression ( linear_model ): OLS with robust standard errors, RΒ², p-values ( ): OLS with robust standard errors, RΒ², p-values Logistic Regression ( logistic_regression ): Binary classification with odds ratios and accuracy ( ): Binary classification with odds ratios and accuracy Correlation Analysis ( correlation_analysis ): Pearson, Spearman, and Kendall correlations Time Series Analysis βœ… ARIMA Modeling ( arima_model ): Autoregressive integrated moving average with forecasting ( ): Autoregressive integrated moving average with forecasting Time Series Decomposition ( decompose_timeseries ): Trend, seasonal, remainder components ( ): Trend, seasonal, remainder components Stationarity Testing ( stationarity_test ): ADF, KPSS, Phillips-Perron tests Data Transformation βœ… Lag/Lead Variables ( lag_lead ): Create time-shifted variables for analysis ( ): Create time-shifted variables for analysis Winsorization ( winsorize ): Handle outliers by capping extreme values ( ): Handle outliers by capping extreme values Differencing ( difference ): Create stationary series for time series analysis ( ): Create stationary series for time series analysis Standardization ( standardize ): Z-score, min-max, robust scaling Statistical Testing βœ… T-Tests ( t_test ): One-sample, two-sample, paired t-tests ( ): One-sample, two-sample, paired t-tests ANOVA ( anova ): Analysis of variance with Types I/II/III ( ): Analysis of variance with Types I/II/III Chi-Square Tests ( chi_square_test ): Independence and goodness-of-fit ( ): Independence and goodness-of-fit Normality Tests ( normality_test ): Shapiro-Wilk, Jarque-Bera, Anderson-Darling Descriptive Statistics βœ… Summary Statistics ( summary_stats ): Comprehensive descriptives with grouping ( ): Comprehensive descriptives with grouping Outlier Detection ( outlier_detection ): IQR, Z-score, Modified Z-score methods ( ): IQR, Z-score, Modified Z-score methods Frequency Tables ( frequency_table ): Counts and percentages with sorting Advanced Econometrics βœ… Panel Regression ( panel_regression ): Fixed/random effects for longitudinal data ( ): Fixed/random effects for longitudinal data Instrumental Variables ( instrumental_variables ): 2SLS with endogeneity testing ( ): 2SLS with endogeneity testing Vector Autoregression ( var_model ): Multivariate time series modeling Machine Learning βœ… K-Means Clustering ( kmeans_clustering ): Unsupervised clustering with validation ( ): Unsupervised clustering with validation Decision Trees ( decision_tree ): Classification and regression trees ( ): Classification and regression trees Random Forest ( random_forest ): Ensemble methods with variable importance Data Visualization βœ… Scatter Plots ( scatter_plot ): Correlation plots with trend lines ( ): Correlation plots with trend lines Histograms ( histogram ): Distribution analysis with density overlay ( ): Distribution analysis with density overlay Box Plots ( boxplot ): Quartile analysis with outlier detection ( ): Quartile analysis with outlier detection Time Series Plots ( time_series_plot ): Temporal data visualization ( ): Temporal data visualization Correlation Heatmaps ( correlation_heatmap ): Matrix visualization ( ): Matrix visualization Regression Diagnostics ( regression_plot ): Model validation plots File Operations βœ… CSV Import ( read_csv ): Flexible data loading with parsing options and URL support ( ): Flexible data loading with parsing options and URL support Excel Import ( read_excel ): Read .xlsx/.xls files with sheet and range selection ( ): Read .xlsx/.xls files with sheet and range selection JSON Import ( read_json ): Convert JSON to tabular format with flattening ( ): Convert JSON to tabular format with flattening CSV Export ( write_csv ): Data export with formatting control ( ): Data export with formatting control Dataset Information ( data_info ): Comprehensive data structure analysis ( ): Comprehensive data structure analysis Data Filtering ( filter_data ): Complex conditional data selection Natural Language & User Experience βœ… Formula Builder ( build_formula ): Convert natural language to R formulas ( ): Convert natural language to R formulas Formula Validation ( validate_formula ): Check formula syntax against data ( ): Check formula syntax against data Error Recovery ( suggest_fix ): Intelligent error diagnosis and solutions ( ): Intelligent error diagnosis and solutions Data Validation ( validate_data ): Pre-analysis data quality checks ( ): Pre-analysis data quality checks Example Datasets ( load_example ): Built-in datasets for learning and testing Production Ready MCP Protocol : Full JSON-RPC 2.0 compliance : Full JSON-RPC 2.0 compliance Transport Agnostic : stdio, HTTP, WebSocket support : stdio, HTTP, WebSocket support Error Handling : Comprehensive error reporting and validation : Comprehensive error reporting and validation Security: Safe R execution with controlled environment 🎯 Real-World Usage RMCP works through natural conversation with AI assistants. Here's how users actually interact with it: Business Analysis You ask Claude: "I have sales data and marketing spend. Can you analyze the ROI and tell me how effective our marketing is?" Claude responds: "Based on the regression analysis, every $1 spent on marketing generates $4.70 in additional sales. The relationship is highly significant (p < 0.001) and explains 97.9% of sales variation." Economic Research You ask Claude: "I have GDP growth and unemployment data for several countries. Can you test if this supports Okun's Law?" Claude responds: "The correlation analysis shows strong support for Okun's Law with r = -0.944. Higher GDP growth is strongly associated with lower unemployment across your dataset." Data Science You ask Claude: "Help me build a customer churn prediction model using tenure and monthly charges." Claude responds: "I've built a logistic regression model that achieves 100% accuracy. Longer tenure significantly reduces churn risk (11.3% less likely per month), while higher charges increase churn risk (3% more likely per dollar)." Natural Language Formula Building You ask Claude: "I want to predict sales from marketing spend and customer satisfaction scores. Can you help me build the right formula?" Claude responds: "I'll help you build that formula. Using the natural language formula builder, I converted your description to: sales ~ marketing_spend + customer_satisfaction . This creates a multiple regression model with sales as the outcome and both marketing spend and satisfaction as predictors. I can also suggest alternatives like interaction terms if needed." Intelligent Error Recovery You ask Claude: "I'm getting an error about missing R packages when I try to run my analysis." Claude responds: "I can help diagnose that error. The issue is that the 'forecast' package is missing, which is required for Time Series Forecasting. You can fix this by running: install.packages(\"forecast\") . I can also check all your R packages and install everything RMCP needs at once." πŸ“ˆ Visual Analytics - See Plots Directly in Claude New in v0.3.7: All visualization tools now display plots directly in your Claude conversation - no more file management! You ask Claude: "Create a correlation heatmap of my sales, marketing, and customer satisfaction data" Claude responds with: πŸ“Š Interactive heatmap displayed inline showing correlation strengths with color coding showing correlation strengths with color coding πŸ“‹ Statistical analysis : correlation matrix with exact values and significance tests : correlation matrix with exact values and significance tests πŸ’‘ Insights: "Strong positive correlation (r=0.89) between marketing and sales suggests effective targeting" Supported Visual Tools: πŸ”₯ Correlation Heatmaps : Color-coded correlation matrices with statistical values : Color-coded correlation matrices with statistical values πŸ“ˆ Scatter Plots : Trend lines, grouping, and regression analysis : Trend lines, grouping, and regression analysis πŸ“Š Histograms : Distribution analysis with density overlays : Distribution analysis with density overlays πŸ“¦ Box Plots : Outlier detection and quartile analysis : Outlier detection and quartile analysis ⏱️ Time Series : Trend analysis with forecasting : Trend analysis with forecasting πŸ” Regression Diagnostics: 4-panel diagnostic plots for model validation Key Benefits: βœ… Immediate visual feedback - see results instantly - see results instantly βœ… No file management - plots appear in conversation - plots appear in conversation βœ… Combined analysis - statistics + visualizations together - statistics + visualizations together βœ… Professional quality - publication-ready plots with ggplot2 - publication-ready plots with ggplot2 βœ… Optional saving - can still export to files when needed πŸ“Š Validated User Scenarios RMCP has been tested with real-world scenarios achieving 100% success rate: βœ… Business Analysts : Sales forecasting with 97.9% RΒ², $4.70 ROI per marketing dollar : Sales forecasting with 97.9% RΒ², $4.70 ROI per marketing dollar βœ… Economists : Macroeconomic analysis showing Okun's Law (r=-0.944) : Macroeconomic analysis showing Okun's Law (r=-0.944) βœ… Data Scientists : Customer churn prediction with 100% accuracy : Customer churn prediction with 100% accuracy βœ… Researchers: Treatment effect analysis with significant results (p<0.001) πŸ”§ Installation & Setup Prerequisites Python 3.9+ R 4.0+ with required packages (see below) R Package Requirements RMCP requires the following R packages. Install all at once with: # Install all required packages (recommended) install.packages(c( # Core statistical packages " jsonlite " , " plm " , " lmtest " , " sandwich " , " AER " , " dplyr " , # Time series analysis " forecast " , " vars " , " urca " , " tseries " , # Statistical testing " nortest " , " car " , # Machine learning " rpart " , " randomForest " , # Data visualization " ggplot2 " , " gridExtra " , " tidyr " , " rlang " ), repos = " https://cran.rstudio.com/ " ) Minimum Core Packages (basic functionality only): install.packages(c( " jsonlite " , " plm " , " lmtest " , " sandwich " , " AER " )) Feature-Specific Packages: Time Series Analysis : forecast , vars , urca , tseries : , , , Machine Learning : rpart , randomForest : , Data Visualization : ggplot2 , gridExtra , tidyr , rlang : , , , Statistical Testing : nortest , car : , Data Manipulation: dplyr πŸ’‘ Tip: Install all packages first to avoid errors. Missing packages will cause specific tools to fail with clear error messages. Install via pip pip install rmcp Development Installation git clone https://github.com/finite-sample/rmcp.git cd rmcp pip install -e " .[dev] " With Claude Desktop Add to your Claude Desktop MCP configuration: { "mcpServers" : { "rmcp" : { "command" : " rmcp " , "args" : [ " start " ], "env" : {} } } } πŸ“š Usage Examples Quick Start with Claude Desktop Once RMCP is configured in Claude Desktop, you can immediately start asking for statistical analysis: Business Analysis Example: You: "I have sales data: month 1-12, sales [120, 135, 148, 165, 178, 185, 192, 210, 225, 240, 255, 270]. Can you analyze the trend and predict next 3 months?" Claude: "I'll analyze your sales trend using linear regression and forecasting." [Claude automatically calls linear_model and then arima_model] Claude: "Analysis shows strong growth trend with 12.5 units/month increase (RΒ²=0.99). Predicted sales for next 3 months: 285, 298, 311 units with 95% confidence intervals." Scientific Research Example: You: "I want to test if a new drug is effective. Control group: [2.1, 2.3, 2.0, 2.4, 2.2]. Treatment group: [3.1, 3.4, 3.0, 3.3, 3.2]. Please run appropriate statistical test." Claude: "I'll perform a two-sample t-test to compare the groups." [Claude calls t_test tool] Claude: "Results: Treatment group mean (3.2) significantly higher than control (2.2), p-value = 0.0001, 95% CI: [0.8, 1.2]. Strong evidence of treatment effect." Command Line Interface # Start MCP server (stdio transport) rmcp start # Check version and available tools rmcp --version # Development server with debug logging rmcp start --log-level DEBUG Direct Tool Usage (Advanced) For developers building MCP clients or testing tools directly: import asyncio from rmcp . core . server import create_server from rmcp . tools . regression import linear_model # Create server and context server = create_server () context = server . create_context ( "test-1" , "tools/call" ) # Call tool directly result = await linear_model ( context , { "data" : { "sales" : [ 100 , 120 , 140 , 160 , 180 ], "advertising" : [ 10 , 15 , 20 , 25 , 30 ] }, "formula" : "sales ~ advertising" }) print ( f"Advertising effectiveness: $ { result [ 'coefficients' ][ 'advertising' ]:.2f } per dollar" ) print ( f"Model explains { result [ 'r_squared' ]:.1% } of variance" ) MCP Protocol Example Testing with raw JSON-RPC messages: { "jsonrpc" : " 2.0 " , "id" : 1 , "method" : " tools/call " , "params" : { "name" : " correlation_analysis " , "arguments" : { "data" : { "sales" : [ 100 , 150 , 200 , 250 , 300 ], "marketing" : [ 10 , 20 , 30 , 40 , 50 ], "satisfaction" : [ 7.5 , 8.0 , 8.5 , 9.0 , 9.5 ] }, "method" : " pearson " } } } Response: { "jsonrpc" : " 2.0 " , "id" : 1 , "result" : { "content" : [{ "type" : " text " , "text" : { "correlation_matrix" : { "sales" : { "marketing" : 1.0 , "satisfaction" : 0.996 }, "marketing" : { "sales" : 1.0 , "satisfaction" : 0.996 }, "satisfaction" : { "sales" : 0.996 , "marketing" : 0.996 } }, "significance_tests" : { "sales_marketing" : 0.0 , "sales_satisfaction" : 0.000056 , "marketing_satisfaction" : 0.000056 } } }] } } ## πŸ”¬ Advanced Usage Scenarios ### Time Series Forecasting **Business Scenario: Sales Forecasting** You: "I have monthly sales data for 2 years: [150, 162, 178, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495]. Can you forecast next 6 months?" Claude: "I'll analyze the time series and create forecasts using ARIMA modeling." [Claude calls decompose_timeseries and arima_model] Claude: "Strong upward trend detected. ARIMA(1,1,1) model selected. Forecasted sales: 510, 525, 540, 555, 570, 585 with confidence intervals." ### Panel Data Analysis **Economic Research: Income and Education** You: "I have panel data on income and education across countries and years. Can you analyze if education affects income while controlling for country differences?" Claude: "I'll run a panel regression with fixed effects to control for country characteristics." [Claude calls panel_regression] Claude: "Results show each additional year of education increases income by $2,450 (p<0.001), controlling for unobserved country characteristics. RΒ²=0.87 within countries." ### Machine Learning Workflow **Customer Segmentation Analysis** You: "Help me segment customers based on: age [25,35,45,30,40], spending [1200,2500,800,1800,3200], frequency [12,25,8,15,30]. How many segments should I use?" Claude: "I'll perform k-means clustering to identify customer segments." [Claude calls kmeans_clustering with different k values] Claude: "Optimal segmentation shows 3 distinct customer groups: Young High-Value (age 30, spend $2650, freq~27) $2650, freq~27) Mature Moderate (age 40, spend $1500, freq~13) $1500, freq~13) Budget Conscious (age 45, spend $800, freq~8)" ### Statistical Testing Workflow **A/B Test Analysis** You: "I ran an A/B test. Control conversion: [0.12, 0.11, 0.13, 0.10, 0.14]. Treatment conversion: [0.18, 0.19, 0.17, 0.20, 0.16]. Is the difference significant?" Claude: "I'll run appropriate tests to evaluate your A/B test results." [Claude calls t_test and normality_test] Claude: "Treatment significantly outperforms control: Control: 12.0% conversion rate Treatment: 18.0% conversion rate Lift: +50% improvement (p=0.003, 95% CI: [2.8%, 9.2%]) Power analysis: 89% power to detect this effect size" ## πŸ“‹ Complete Tool Reference ### Regression & Modeling | Tool | Purpose | Key Outputs | |------|---------|-------------| | `linear_model` | OLS regression | RΒ², coefficients, p-values, diagnostics | | `logistic_regression` | Binary/categorical outcomes | Odds ratios, accuracy, ROC | | `panel_regression` | Longitudinal data | Fixed/random effects, within RΒ² | | `instrumental_variables` | Causal inference | 2SLS estimates, endogeneity tests | ### Time Series Analysis | Tool | Purpose | Key Outputs | |------|---------|-------------| | `arima_model` | Forecasting | Predictions, confidence intervals, AIC | | `decompose_timeseries` | Trend/seasonal analysis | Components, seasonality strength | | `stationarity_test` | Unit root testing | ADF, KPSS, PP test statistics | | `var_model` | Multivariate series | IRF, FEVD, Granger causality | ### Statistical Testing | Tool | Purpose | Key Outputs | |------|---------|-------------| | `t_test` | Mean comparisons | t-statistic, p-value, confidence intervals | | `anova` | Group differences | F-statistic, effect sizes, post-hoc | | `chi_square_test` | Independence/goodness-of-fit | χ² statistic, CramΓ©r's V | | `normality_test` | Distribution testing | Shapiro-Wilk, Jarque-Bera p-values | ### Data Analysis | Tool | Purpose | Key Outputs | |------|---------|-------------| | `correlation_analysis` | Association strength | Correlation matrix, significance tests | | `summary_stats` | Descriptive statistics | Mean, median, SD, quartiles | | `outlier_detection` | Anomaly identification | Outlier indices, methods comparison | | `frequency_table` | Categorical analysis | Counts, percentages, sorted tables | ## πŸ§ͺ Testing & Validation RMCP includes comprehensive testing with realistic scenarios: ```bash # Run all user scenarios (should show 100% pass rate) python tests/realistic_scenarios.py # Run development test script bash src/rmcp/scripts/test.sh Current Test Coverage: βœ… MCP Interface : 100% success rate (5/5 tests) - Validates actual Claude Desktop integration : 100% success rate (5/5 tests) - Validates actual Claude Desktop integration βœ… User Scenarios : 100% success rate (4/4 tests) - Validates real-world usage patterns : 100% success rate (4/4 tests) - Validates real-world usage patterns βœ… Conversational Examples: All documented examples tested and verified working πŸ—οΈ Architecture RMCP is built with production best practices: Clean Architecture : Modular design with clear separation of concerns : Modular design with clear separation of concerns MCP Compliance : Full Model Context Protocol specification support : Full Model Context Protocol specification support Transport Layer : Pluggable transports (stdio, HTTP, WebSocket) : Pluggable transports (stdio, HTTP, WebSocket) R Integration : Safe subprocess execution with JSON serialization : Safe subprocess execution with JSON serialization Error Handling : Comprehensive error reporting and recovery : Comprehensive error reporting and recovery Security: Controlled R execution environment src/rmcp/ β”œβ”€β”€ core/ # MCP server core β”œβ”€β”€ tools/ # Statistical analysis tools β”œβ”€β”€ transport/ # Communication layers β”œβ”€β”€ registries/ # Tool and resource management └── security/ # Safe execution environment 🀝 Contributing We welcome contributions! Please see our contributing guidelines. Development Setup git clone https://github.com/finite-sample/rmcp.git cd rmcp pip install -e " .[dev] " pre-commit install Running Tests RMCP includes comprehensive organized testing: # Run all tests in logical order python run_tests.py # Run specific test categories python tests/unit/test_new_tools.py # Unit tests python tests/integration/test_mcp_interface.py # Integration tests python tests/e2e/test_claude_desktop_scenarios.py # End-to-end tests # Run pytest (if available) pytest tests/unit/ -v # Unit tests only πŸ“„ License MIT License - see LICENSE file for details. πŸ› οΈ Troubleshooting Quick Fixes for Common Issues R not found: # Check R installation R --version # Install R if missing (macOS) brew install r # Install R (Ubuntu) sudo apt-get install r-base Missing R packages: First, check which packages are missing: rmcp check-r-packages Then install missing packages in R: # Install all RMCP packages (recommended) install.packages(c( " jsonlite " , " plm " , " lmtest " , " sandwich " , " AER " , " dplyr " , " forecast " , " vars " , " urca " , " tseries " , " nortest " , " car " , " rpart " , " randomForest " , " ggplot2 " , " gridExtra " , " tidyr " , " rlang " ), repos = " https://cran.rstudio.com/ " ) # Or install just core packages (limited functionality) install.packages(c( " jsonlite " , " plm " , " lmtest " , " sandwich " , " AER " )) Package installation fails: On Ubuntu/Debian: sudo apt-get install r-base-dev libcurl4-openssl-dev libssl-dev libxml2-dev On macOS with Homebrew: brew install r On Windows: Download from CRAN MCP connection issues: # Test server directly echo ' {"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}} ' | rmcp start # Check Claude Desktop MCP configuration # Ensure rmcp is in PATH: which rmcp For detailed troubleshooting: See docs/troubleshooting.md πŸ™‹ Support πŸŽ‰ Acknowledgments RMCP builds on the excellent work of: Model Context Protocol specification R Project statistical computing environment The broader open-source statistical computing community Ready to analyze data like never before? Install RMCP and start running sophisticated statistical analyses through AI assistants today! πŸš€